Saul inference: moving beyond LBJava's inference #401
Saul inference: moving beyond LBJava's inference #401danyaljj wants to merge 71 commits intoCogComp:masterfrom danyaljj:addingSaulInference
Conversation
|
Do you have any documentation that we can start reading from there? |
|
Yes, see the changes. |
|
ok, I'll do. |
saul-core/doc/SAULLANGUAGE.md
Outdated
| the inference starts from the head object. This function finds the objects of type `INPUT_TYPE` which are | ||
| connected to the target object of type `HEAD_TYPE`. If we don't define `filter`, by default it returns all | ||
| objects connected to `HEAD_TYPE`. The filter is useful for the `JointTraining` when we go over all | ||
| global objects and generate all contained object that serve as examples for the basic classifiers involved in |
saul-core/doc/SAULLANGUAGE.md
Outdated
| In Saul, the constraints are defined for the assignments to class labels. | ||
| A constraint classifiers is a classifier that predicts the class labels with regard to the specified constraints. | ||
| In Saul, the constraints are defined for the assignments to class labels. In what follows we outine the details of operators | ||
| which help us define the constraints. Before jumping into the details, note that you have to have the folling import |
saul-core/doc/SAULLANGUAGE.md
Outdated
|
|
||
| In the above definition, `on` and `is` are keywords. | ||
|
|
||
| Here different variations of this basic, but there are different variations to it: |
There was a problem hiding this comment.
nit: didn't understand this sentence.
saul-core/doc/SAULLANGUAGE.md
Outdated
| | `ForEach` | This operator works only on `Node`s. For each single instance in the node. This is often times one of the starting points for defining constraints. So if you are defining using a constrained classifier with head type `HEAD_TYPE`, we the definition of the constraint have to start with the node corresponding to this type. | `textAnnotationNode.ForEach { x: TextAnnotation => Some-Constraint-On-X }` | | ||
| | `ForAll` | For **all** the elements in the collection it applies the constraints. In other words, the constrain should hold for **all** elements of the collection. | `textAnnotationNode.ForAll { x: TextAnnotation => Some-Constraint-On-x }` | | ||
| | `Exists` | The constrain should hold for **at least one** element of the collection. | `textAnnotationNode.Exists { x: TextAnnotation => Some-Constraint-On-x }` | | ||
| | `AtLest(k: Int)` | The constrain should hold for **at least `k`** elements of the collection. | `textAnnotationNode.AtLeast(2) { x: TextAnnotation => Some-Constraint-On-x }` | |
There was a problem hiding this comment.
nit: AtLeast(k: Int) in the first column.
saul-core/doc/SAULLANGUAGE.md
Outdated
|
|
||
| There are just the definitions of the operations. If you want to see real examples of the operators in actions see [the definitions of constraints for ER-example](https://github.com/IllinoisCogComp/saul/blob/master/saul-examples/src/main/scala/edu/illinois/cs/cogcomp/saulexamples/nlp/EntityRelation/EntityRelationConstraints.scala). | ||
|
|
||
| **Tip:** Note whenever the constrained inference is infeasible (i.e. the constraints are overlly tight), we use the default |
| case (learner, trainInstances) => | ||
| logger.info(evalSeparator) | ||
| logger.info("Training " + learner.getClassSimpleNameForClassifier) | ||
| println(evalSeparator) |
There was a problem hiding this comment.
Is the change from logger.info to println intentional? No strong opinion, just wanted to confirm.
There was a problem hiding this comment.
I changed it because it was messing up the formatting of the output results. But now in retrospect, I think only the ones related to testing was needed. I returned back the rest.
| val instanceIsInvolvedInConstraint = instancesInvolved.exists { set => | ||
| set.exists { | ||
| case x: T => x == t | ||
| case everythingElse => false |
| case (singleConstraint, ins) => | ||
| ins union getInstancesInvolved(singleConstraint).asInstanceOf[Set[Any]] | ||
| } | ||
| case c: AtMost[_, _] => |
There was a problem hiding this comment.
Probably, I miss something here, but could you explain a bit: all these have the same body? AtMost, AtLeast, forAll...?
There was a problem hiding this comment.
Good point. Merged them into one by adding an extra type.
|
Applied the comments. Let me know if you have any other comments. |
| object TestConstraintClassifier extends ConstrainedClassifier[String, String] { | ||
| override def subjectTo = None | ||
| override val solverType = OJAlgo | ||
| override lazy val onClassifier = TestClassifier |
There was a problem hiding this comment.
Maybe change the onClassifier to baseClassifier
| object OrgConstrainedClassifier extends ConstrainedClassifier[ConllRawToken, ConllRelation] { | ||
| override lazy val onClassifier = EntityRelationClassifiers.OrganizationClassifier | ||
| override def pathToHead = Some(-EntityRelationDataModel.pairTo2ndArg) | ||
| override def subjectTo = Some(EntityRelationConstraints.relationArgumentConstraints) |
There was a problem hiding this comment.
What if I want to say subjectTo = Some(worksForConstraint)? what will be the syntax? workForContatint needs an input parameter.
There was a problem hiding this comment.
Another relevant comment, how can I just express true or false i.e. constant expressions as constraints. Please add these to the documentation also.
There was a problem hiding this comment.
What if I want to say subjectTo = Some(worksForConstraint)? what will be the syntax? workForContatint needs an input parameter.
Regarding the first questions (as also mentioned in the documentation), the definition of the constraints starts with Node and use ForEach operator. If you want to define "worksForConstraints", instead of having it as a function:
def worksForConstraint(x: ConllRelation) = {
(WorksForClassifier on x isTrue) ==> ((PersonClassifier on x.e1 isTrue) and (OrganizationClassifier on x.e2 isTrue))
}it should be written as
def worksForConstraint = EntityRelationDataModel.pairs.ForEach { x: ConllRelation =>
(WorksForClassifier on x isTrue) ==> ((PersonClassifier on x.e1 isTrue) and (OrganizationClassifier on x.e2 isTrue))
}and then you can do subjectTo = Some(worksForConstraint) ....
Another relevant comment, how can I just express true or false i.e. constant expressions as constraints. Please add these to the documentation also.
You can't (and you shouldn't) define constants.
There was a problem hiding this comment.
why not mapping the case with constant True to the case with no constraints and the case with constant False to an infeasible solution message? I think it would more expressive and robust to cover all possible logical expressions.
There was a problem hiding this comment.
why not mapping the case with constant True to the case with no constraints and the case with constant False to an infeasible solution message? I think it would more expressive and robust to cover all possible logical expressions.
I can almost surely guarantee that you never need to use constant True or constant False. Can you come up with an example that either of these constants are necessary?
There was a problem hiding this comment.
From my view, it is a matter of completeness of the representation. And those are the basic cases that your system should not fail to address those.
There was a problem hiding this comment.
If something is never needed why is it a "basic case" and "matter of completeness"?
There was a problem hiding this comment.
Unless you give me a concrete example that it's needed, I won't be convinced.
There was a problem hiding this comment.
Can you give me a concrete application example that a hypothesis which says h=True is useful? Why the most general hypothesish=True is discussed and mentioned at all!!
Your argument looks like this to me.
There was a problem hiding this comment.
If something is never needed why is it a "basic case" and "matter of completeness?"
I missed this sentence, now I see it. I think the confusion comes from where we use first-order logic inside a different formalism that is ILP, therefore, it seems we can look at it only very practically without thinking if our constraint representation is even sound. From my view when you talk about logical expressions in any context the first expressions to think of are True and False and if I can not represent these, I am not sure what I am talking about then. We probably need to ask a third or more opinions on this. I do not have concrete examples.
|
For Foreach, I think the documentation is not clear enough, I can not see easily that I alway need to start writing any kind of constraint with Foreach expression, maybe you should say this above the table. Also, why this should be the case? I guess having ForAll and Foreach with two different semantics here will be confusing, from the logical expressions perspective. Unless this is technically impossible to change, I would suggest changing this. |
Yes there is a conceptual difference.
For |
Update Badge example to use new Constraint convention.
Update the Saul Inference PR
bhargav
left a comment
There was a problem hiding this comment.
Also I agree with @kordjamshidi 's comment about the ambiguity of ForEach and ForAll. ForEach is implemented as a quantifier that you can apply to nodes.
But ForAll is implemented as a conjunction of constraint clauses. If we plan to keep this convenience notation, we should rename to avoid confusion.
Ideally, we should be able to do node.ForAll as a quantifier.
| | Operator | Definition | Example | | ||
| |----------|------------|---------|---| | ||
| | `ForEach` | This operator works only on `Node`s. For each single instance in the node. This is often times one of the starting points for defining constraints. So if you are defining using a constrained classifier with head type `HEAD_TYPE`, we the definition of the constraint have to start with the node corresponding to this type. | `textAnnotationNode.ForEach { x: TextAnnotation => Some-Constraint-On-X }` | | ||
| | `ForAll` | For **all** the elements in the collection it applies the constraints. In other words, the constrain should hold for **all** elements of the collection. | `textAnnotationNode.ForAll { x: TextAnnotation => Some-Constraint-On-x }` | |
There was a problem hiding this comment.
Minor typo: constrain -> constraint in this and the next 4 lines.
| object TestConstraintClassifier extends ConstrainedClassifier[String, String] { | ||
| override def subjectTo = None | ||
| override val solverType = OJAlgo | ||
| override lazy val onClassifier = TestClassifier |
| } | ||
|
|
||
| /** find all the instances used in the definiton of the constraint. This is used in caching the results of inference */ | ||
| private def getInstancesInvolved(constraint: Constraint[_]): Set[_] = { |
There was a problem hiding this comment.
The getInstancesInvolved and getClassifiersInvolved methods do not depend on ConstrainedClassifer (They don't need to be here). We should add these methods to the trait/abstract class for Constraint and have them implemented there.
There was a problem hiding this comment.
I meant adding them to the trait for Constraint[T] here https://github.com/danyaljj/saul-1/blob/addingSaulInference/saul-core/src/main/scala/edu/illinois/cs/cogcomp/saul/classifier/infer/Constraints.scala#L114
There was a problem hiding this comment.
I see; that is doable, but not sure if we would gain significant anything from that.....
|
@bhargav I think the introduction of |
|
Updated the documentation to make the confusion between |
|
Overall the change look good. I want to do a test on the JVM memory usage after these changes. I'll do a comparative run on some of our examples. ETA: 1-2 days. |
Items to finish:
Results on ER (L+I):
Exactly the same numbers in the readme file: