-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: initializers module #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor/rework-arch
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In examples we use imports from modules, not files inside module. That's why there is __init__.py files
| from rework_pysatl_mpest.optimizers import Optimizer, ScipyNelderMead | ||
|
|
||
|
|
||
| def _validate_clusters_distributions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed docstrings
| return valid_clusters, cluster_weights | ||
|
|
||
|
|
||
| def _calculate_cluster_fit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed docstrings
| H_k: np.ndarray, | ||
| optimizer: Optimizer, | ||
| ) -> tuple[dict[str, float], float]: | ||
| new_params = estimation_func(temp_model, X, H_k, optimizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| new_params = estimation_func(temp_model, X, H_k, optimizer) | |
| param_names, param_values = estimation_func(temp_model, X, H_k, optimizer).items() |
| temp_model.set_params_from_vector(param_names, param_values) | ||
|
|
||
| log_probs = np.clip(temp_model.lpdf(X), -1e9, -1e-9) | ||
| weighted_log_likelihood = np.sum(H_k * log_probs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| weighted_log_likelihood = np.sum(H_k * log_probs) | |
| weighted_log_likelihood = np.dot(H_k, log_probs) |
| 5. Returns the initialized mixture model | ||
| """ | ||
| X = np.asarray(X, dtype=np.float64) | ||
| if X.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this inside self._clusterize(...)
Otherwise, your two-dimensional array X will be passed to the rest of the class's internal methods, which seem not to be suited for this.
| initializer = ClusterizeInitializer(is_accurate=True, is_soft=True, clusterizer=self.mock_clusterizer) | ||
|
|
||
| X = np.array([1.0, 2.0, 3.0]) | ||
| X = np.array([[1.0], [2.0], [3.0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what?
This test is not intended to check various X formats.
Besides, why this particular test? It seems to me that tests should be written in the format the user will work with, assuming a one-dimensional array will be passed.
| ) | ||
|
|
||
| effective_n = cluster_weights[k] | ||
| score = weighted_log_likelihood / effective_n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an additional division here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the constructor and the perform signature:
The user should pass the default optimizer and clusterer to the constructor, and pass them to perform if they want to use other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to add a check that the clusterer has the required methods in the constructor and in perform, since the object type is Any
| X: ArrayLike, | ||
| dists: list[ContinuousDistribution], | ||
| cluster_match_info: ClusterMatchStrategy, | ||
| cluster_match_info: MatchingMethod, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to update the type hint in the docs
| for model, est_func in zip(models, estimation_strategies): | ||
| row: list[FitResult] = [] | ||
| for k in valid_clusters: | ||
| cache_key = (model.__class__, est_func, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, but what happens if models contains multiple instances of the same class? It seems like they would share the same cache key.
| score_func: ScoringMethod, | ||
| estimation_strategies: list[EstimationStrategy], | ||
| optimizer: Optimizer = ScipyNelderMead(), | ||
| ) -> MixtureModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument names here do not match the abstract base class definition in Initializer
| model_weights: list[float] = [] | ||
| used_clusters = set() | ||
|
|
||
| for model, estimation_func in zip(context["models"], context["estimation_strategies"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The greedy strategy currently iterates sequentially. Was this order-dependency intended?
…ter_fit and _validate_clusters_distributions
…eparated into greedy, hungarian, permutation methods, also for scoring functions aic and likelihood feat: Enums for matching methods, scoring methods tests: tests that include cluster_match_strategy commented/deleted � Conflicts: � rework_pysatl_mpest/initializers/cluster_match_strategy.py � rework_pysatl_mpest/initializers/clusterize_initializer.py
remove models permutations from permutation match strategy, remove conversion from match strategies, add clusterizer and optimizer choose in both class constructor and perform method, changed numerical constant on adequate, small test fix (due to the constant change, the tests fails)
64cc60a to
e054893
Compare
No description provided.