[Cherry Picks] Analyze Bug Fixes (Updated) (#465)

Satrat · dbogunowicz · rahul-tuli · web-flow · commit edf177e8ef75 · 2024-02-22T10:28:29.000-05:00
* `RegistryMixin` improved alias management (#404) * initial commit * add docstrings * simplify * hardening * refactor * format registry lookup strings to be lowercases * standardise aliases * Move evaluator registry (#411) * More control over external data size (#412) * When splitting external data, avoid renaming `model.data` to `model.data.1` if only one external data file gets eventually saved (#414) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * Raise TypeError instead of ValueError (#426) * Fix misleading docstring (#416) Add test * add support for benchmark.yaml (#415) * add support for benchmark.yaml recent zoo models use `benchmark.yaml` instead of `benchmarks.yaml`. adding this additional pathway so `benchmark.yaml` is downloaded in the bulk model download * update files filter * fix tests --------- Co-authored-by: dbogunowicz <damian@neuralmagic.com> * [BugFix] Add analyze to init (#421) * Add analyze to init * Move onnxruntime to deps * Print model analysis (#423) * [model.download] fix function returning nothing (#420) * [BugFix] Path not expanded (#418) * print model-analysis * [Fix] Allow for processing Path in the sparsezoo analysis (#417) * add print statement at the end of cli run --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> * Omit scalar weight (#424) * ommit scalar weights: * remove unwanted files * comment * Update src/sparsezoo/utils/onnx/analysis.py Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> --------- Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> * update analyze help message for correctness (#432) * initial commit (#430) * [sparsezoo.analyze] Fix pathway such that it works for larger models (#437) * fix analyze to work with larger models * update for failing tests; add comments * Update src/sparsezoo/utils/onnx/external_data.py Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> * Delete hehe.py (#439) * Download deployment dir for llms (#435) * Download deployment dir for llms * Use path instead of download * only set save_as_external_data to true if the model originally had external data (#442) * Add Channel Wise Quantization Support (#441) * Chunk download (#429) * chunk download, break down into 10 * lint * threads download * draft * chunk download draft * job based download and combining/deleteing chunks * delete old code * lint * fix num jobs if file_size is less than the chunk size * doc string and return types * test * lint * fix type hints (#445) * fix bug if the value is a dict (#447) * [deepsparse.analyze] Fix v1 functionality to work with llms (#451) * fix equivalent changes made to analyze_v2 such that inference session works for llms; update wanrings to be debug printouts * typo * overwrite file (#450) Co-authored-by: 21 <a21@21s-MacBook-Pro.local> * Adds a `numpy_array_representer` to yaml (#454) on runtime, to avoid serialization issues * Avoid division by zero (#457) Avoid log of zero * op analysis total counts had double sparse counts (#461) * Rename legacy analyze to analyze_v1 (#459) * Fixing Quant % Calcuation (#462) * initial fix * style * Include Sparsity in Size Calculation (#463) * initial fix * style * incorporate sparsity into size calculation * quality * op analysis total counts had double sparse counts (#461) * Fixing Quant % Calcuation (#462) * initial fix * style * Include Sparsity in Size Calculation (#463) * initial fix * style * incorporate sparsity into size calculation * quality * Revert "Merge branch 'main' into analyze_cherry_picks" This reverts commit 509fa1a, reversing changes made to 08f94c4. --------- Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: dbogunowicz <damian@neuralmagic.com> Co-authored-by: George <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.coom> Co-authored-by: 21 <a21@21s-MacBook-Pro.local>
diff --git a/src/sparsezoo/analyze_v2/memory_access_analysis.py b/src/sparsezoo/analyze_v2/memory_access_analysis.py
@@ -73,7 +73,7 @@ def get_quantization(self) -> List["QuantizationAnalysisSchema"]:
         :returns: List of quantization analysis pydantic models for each grouping
          if the node has weights
         """
-        data = get_memeory_access_bits(self.model_graph, self.node, self.node_shape)
+        data = get_memory_access_bits(self.model_graph, self.node, self.node_shape)
         if data is not None:
             quantization_analysis_model = []
             for grouping, counts_dict in data.items():
@@ -152,7 +152,7 @@ def get_memory_access_counts(
         }
 
 
-def get_memeory_access_bits(
+def get_memory_access_bits(
     model_graph: ONNXGraph,
     node: NodeProto,
     node_shape: Dict,
@@ -164,12 +164,15 @@ def get_memeory_access_bits(
         )
         node_weight = get_node_weight(model_graph, node)
         precision = get_numpy_quantization_level(node_weight)
-        bits = memory_access_counts["single"]["counts"] * precision
-        bits_quant = bits * is_quantized_layer(model_graph, node)
+        counts = memory_access_counts["single"]["counts"]
+        bits = counts * precision
+        is_quantized = is_quantized_layer(model_graph, node)
 
         return {
             "tensor": {
                 "bits": bits,
-                "bits_quant": bits_quant,
+                "bits_quant": bits * is_quantized,
+                "counts": counts,
+                "counts_quant": counts * is_quantized,
             }
         }
diff --git a/src/sparsezoo/analyze_v2/model_analysis.py b/src/sparsezoo/analyze_v2/model_analysis.py
@@ -78,10 +78,10 @@ def calculate_sparsity_percentage(self, category: Dict):
         counts = category["counts"]
         return (counts_sparse / counts) * 100 if counts != 0 else 0
 
-    def calculate_quantized_percentage(self, tensor: Dict):
-        bits_quant = tensor["bits_quant"]
-        bits = tensor["bits"]
-        return (bits_quant / bits) * 100 if bits != 0 else 0
+    def calculate_quantized_percentage(self, tensor: Dict, counts_prefix: str):
+        counts_quant = tensor[f"{counts_prefix}_quant"]
+        counts = tensor[counts_prefix]
+        return (counts_quant / counts) * 100 if counts != 0 else 0
 
     def __repr__(self):
         data = self.to_dict()
@@ -93,7 +93,7 @@ def __repr__(self):
         )
         param_size = summaries["params"]["quantization"]["tensor"]["bits"]
         param_quantized = self.calculate_quantized_percentage(
-            summaries["params"]["quantization"]["tensor"]
+            summaries["params"]["quantization"]["tensor"], "counts"
         )
 
         ops_total = summaries["ops"]["sparsity"]["single"]["counts"]
@@ -102,7 +102,7 @@ def __repr__(self):
         )
         ops_size = summaries["ops"]["quantization"]["tensor"]["bits"]
         ops_quantized = self.calculate_quantized_percentage(
-            summaries["ops"]["quantization"]["tensor"]
+            summaries["ops"]["quantization"]["tensor"], "counts"
         )
 
         mem_access_total = summaries["mem_access"]["sparsity"]["single"]["counts"]
@@ -111,7 +111,7 @@ def __repr__(self):
         )
         mem_access_size = summaries["mem_access"]["quantization"]["tensor"]["bits"]
         mem_access_quantized = self.calculate_quantized_percentage(
-            summaries["mem_access"]["quantization"]["tensor"]
+            summaries["mem_access"]["quantization"]["tensor"], "counts"
         )
 
         return (
diff --git a/src/sparsezoo/analyze_v2/operation_analysis.py b/src/sparsezoo/analyze_v2/operation_analysis.py
@@ -166,22 +166,23 @@ def get_operation_bits(
             precision = get_numpy_quantization_level(node_weight)
             is_quantized_op = "32" not in str(precision)
 
-            bits = (
-                ops["single"]["counts"] + ops["single"]["counts_sparse"]
-            ) * precision
-
-            bits_block4 = (
-                ops["block4"]["counts"] + ops["block4"]["counts_sparse"]
-            ) * precision
-
-            bits_quant = is_quantized_op * bits
+            single_counts = ops["single"]["counts"]
+            single_counts_sparse = ops["single"]["counts_sparse"]
+            single_bits = (single_counts - single_counts_sparse) * precision
+            block4_counts = ops["block4"]["counts"]
+            block4_counts_sparse = ops["block4"]["counts_sparse"]
+            block4_bits = (block4_counts - block4_counts_sparse) * precision
             return {
                 "tensor": {
-                    "bits": bits,
-                    "bits_quant": bits_quant,
+                    "counts": single_counts,
+                    "counts_quant": is_quantized_op * single_counts,
+                    "bits": single_bits,
+                    "bits_quant": is_quantized_op * single_bits,
                 },
                 "block4": {
-                    "bits": bits_block4,
-                    "bits_quant": bits_quant,
+                    "counts": block4_counts,
+                    "counts_quant": is_quantized_op * block4_counts,
+                    "bits": block4_bits,
+                    "bits_quant": is_quantized_op * block4_bits,
                 },
             }
diff --git a/src/sparsezoo/analyze_v2/parameter_analysis.py b/src/sparsezoo/analyze_v2/parameter_analysis.py
@@ -29,7 +29,7 @@
     get_node_num_four_block_zeros_and_size,
     get_node_param_counts,
     get_node_weight,
-    get_node_weight_bits,
+    get_node_weight_precision,
     get_numpy_distribution_statistics,
     get_numpy_entropy,
     get_numpy_modes,
@@ -153,14 +153,17 @@ def get_parameter_bits(
     If the layer is quantized, assume all its elements in the ndarray
      are quantized
     """
-    node_weight = get_node_weight(model_graph, node)
-    if node_weight is not None and node_weight.size > 0:
-        bits = get_node_weight_bits(model_graph, node)
-
+    num_weights, num_bias, num_sparse_weights = get_node_param_counts(node, model_graph)
+    if num_weights > 0:
+        precision = get_node_weight_precision(model_graph, node)
+        is_quantized = is_quantized_layer(model_graph, node)
+        num_non_sparse_weights = num_weights - num_sparse_weights + num_bias
         return {
             "tensor": {
-                "bits": bits,
-                "bits_quant": bits * is_quantized_layer(model_graph, node),
+                "counts": num_weights,
+                "counts_quant": num_weights * is_quantized,
+                "bits": num_non_sparse_weights * precision,
+                "bits_quant": num_non_sparse_weights * precision * is_quantized,
             },
         }
 
diff --git a/src/sparsezoo/analyze_v2/schemas/quantization_analysis.py b/src/sparsezoo/analyze_v2/schemas/quantization_analysis.py
@@ -20,6 +20,14 @@
 
 
 class QuantizationSummaryAnalysisSchema(BaseModel):
+    counts: float = Field(..., description="Total number of weights")
+    counts_quant: int = Field(
+        ...,
+        description=(
+            "Total number of quantized weights."
+            "Here we assume if the layer is quantized, the entire array is quantized"
+        ),
+    )
     bits: float = Field(..., description="Total bits required to store the weights")
     bits_quant: int = Field(
         ...,
@@ -39,9 +47,9 @@ def validate_types(cls, value):
     @validator("percent", pre=True, always=True)
     def calculate_percent_if_none(cls, value, values):
         if value is None:
-            bits = values.get("bits", 0)
-            bits_quant = values.get("bits_quant", 0)
-            return bits_quant / bits if bits > 0 else 0.0
+            counts = values.get("counts", 0)
+            counts_quant = values.get("counts_quant", 0)
+            return counts_quant / counts if counts > 0 else 0.0
         return value
 
     def __add__(self, model: BaseModel):
@@ -51,7 +59,9 @@ def __add__(self, model: BaseModel):
 
         if validator_model is not None:
             return validator_model(
+                counts=self.counts + model.counts,
                 bits=self.bits + model.bits,
+                counts_quant=self.counts_quant + model.counts_quant,
                 bits_quant=self.bits_quant + model.bits_quant,
             )
 
@@ -67,6 +77,8 @@ def __add__(self, model: BaseModel):
         if validator_model is not None and self.grouping == model.grouping:
             return validator_model(
                 grouping=self.grouping,
+                counts=self.counts + model.counts,
                 bits=self.bits + model.bits,
+                counts_quant=self.counts_quant + model.counts_quant,
                 bits_quant=self.bits_quant + model.bits_quant,
             )
diff --git a/src/sparsezoo/utils/onnx/analysis.py b/src/sparsezoo/utils/onnx/analysis.py
@@ -48,7 +48,7 @@
     "get_numpy_distribution_statistics",
     "get_numpy_quantization_level",
     "get_numpy_bits",
-    "get_node_weight_bits",
+    "get_node_weight_precision",
     "get_node_param_counts",
     "get_node_kernel_shape",
 ]
@@ -485,13 +485,13 @@ def get_node_param_counts(
     return params, bias, sparse_params
 
 
-def get_node_weight_bits(
+def get_node_weight_precision(
     model_graph: ONNXGraph,
     node: NodeProto,
 ) -> int:
-    """Get the bits needed to store the node weights"""
+    """Get the precision of the node in number of bits"""
     node_weight = get_node_weight(model_graph, node)
-    return get_numpy_bits(node_weight)
+    return get_numpy_quantization_level(node_weight)
 
 
 def get_numpy_bits(arr: numpy.ndarray) -> int:

Original file line number	Diff line number	Diff line change
`@@ -78,10 +78,10 @@ def calculate_sparsity_percentage(self, category: Dict):`
`78`	`78`	`counts = category["counts"]`
`79`	`79`	`return (counts_sparse / counts) * 100 if counts != 0 else 0`
`80`	`80`
`81`		`- def calculate_quantized_percentage(self, tensor: Dict):`
`82`		`- bits_quant = tensor["bits_quant"]`
`83`		`- bits = tensor["bits"]`
`84`		`- return (bits_quant / bits) * 100 if bits != 0 else 0`
	`81`	`+ def calculate_quantized_percentage(self, tensor: Dict, counts_prefix: str):`
	`82`	`+ counts_quant = tensor[f"{counts_prefix}_quant"]`
	`83`	`+ counts = tensor[counts_prefix]`
	`84`	`+ return (counts_quant / counts) * 100 if counts != 0 else 0`
`85`	`85`
`86`	`86`	`def __repr__(self):`
`87`	`87`	`data = self.to_dict()`
`@@ -93,7 +93,7 @@ def __repr__(self):`
`93`	`93`	`)`
`94`	`94`	`param_size = summaries["params"]["quantization"]["tensor"]["bits"]`
`95`	`95`	`param_quantized = self.calculate_quantized_percentage(`
`96`		`- summaries["params"]["quantization"]["tensor"]`
	`96`	`+ summaries["params"]["quantization"]["tensor"], "counts"`
`97`	`97`	`)`
`98`	`98`
`99`	`99`	`ops_total = summaries["ops"]["sparsity"]["single"]["counts"]`
`@@ -102,7 +102,7 @@ def __repr__(self):`
`102`	`102`	`)`
`103`	`103`	`ops_size = summaries["ops"]["quantization"]["tensor"]["bits"]`
`104`	`104`	`ops_quantized = self.calculate_quantized_percentage(`
`105`		`- summaries["ops"]["quantization"]["tensor"]`
	`105`	`+ summaries["ops"]["quantization"]["tensor"], "counts"`
`106`	`106`	`)`
`107`	`107`
`108`	`108`	`mem_access_total = summaries["mem_access"]["sparsity"]["single"]["counts"]`
`@@ -111,7 +111,7 @@ def __repr__(self):`
`111`	`111`	`)`
`112`	`112`	`mem_access_size = summaries["mem_access"]["quantization"]["tensor"]["bits"]`
`113`	`113`	`mem_access_quantized = self.calculate_quantized_percentage(`
`114`		`- summaries["mem_access"]["quantization"]["tensor"]`
	`114`	`+ summaries["mem_access"]["quantization"]["tensor"], "counts"`
`115`	`115`	`)`
`116`	`116`
`117`	`117`	`return (`