Skip to content

Commit e09ca6b

Browse files
Update README.md
1 parent b85d44c commit e09ca6b

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,14 @@ python train.py
4040
### 3.Calculate similarity matrix (inference)
4141
```
4242
Example:
43-
python cal_column_similarity.py -p Test\ Data/self -m model/2022-04-11-17-10-11
43+
python cal_column_similarity.py -p Test\ Data/self -m model/2022-04-11-17-10-11 -s one-to-one
4444
python cal_column_similarity.py -p Test\ Data/authors -m model/2022-04-11-17-10-11 -t 0.9
4545
```
4646
Parameters:
4747
- -p: Path to test data folder, must contain "Table1.csv" and "Table2.csv"
4848
- -m: Path to trained model folder, which must contain at least one pair of ".model" file and ".threshold" file.
4949
- -t: Threshold, you can use this parameter to specify threshold value, suggest 0.9 for easy matching(column name very similar). Default value is calculated from training data, which is around 0.15-0.2. This value is used for difficult matching(column name masked or very different).
50+
- -s: Strategy, there are two options: "one-to-one" and "one-to-many". "one-to-one" means that one column can only be matched to one column. "one-to-many" means that there is no restrictions. Default is "one-to-many".
5051
## Feature Engineering
5152

5253
Features: "is_url","is_numeric","is_date","is_string","numeric:mean", "numeric:min", "numeric:max", "numeric:variance","numeric:cv", "numeric:unique/len(data_list)", "length:mean", "length:min", "length:max", "length:variance","length:cv", "length:unique/len(data_list)", "whitespace_ratios:mean","punctuation_ratios:mean","special_character_ratios:mean","numeric_ratios:mean", "whitespace_ratios:cv","punctuation_ratios:cv","special_character_ratios:cv","numeric_ratios:cv", "colname:bleu_score", "colname:edit_distance","colname:lcs","colname:tsm_cosine", "colname:one_in_one"

0 commit comments

Comments
 (0)