Skip to content

Commit ce8ac23

Browse files
committed
fix some IE training data
1 parent 37e432e commit ce8ac23

File tree

8 files changed

+151
-120
lines changed

8 files changed

+151
-120
lines changed

notebooks/train-ie-openhours-parser.ipynb

Lines changed: 136 additions & 108 deletions
Large diffs are not rendered by default.

webstruct_data/corpus/ie/annotated/35.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@
1818
<p>&#160;</p>
1919
<h2 class="title">Opening Hours</h2>
2020

21-
<br/><p class="text1">Tuesday - Sunday:
22-
<br/>17:00:00 - 23:00:00</p>
21+
<br/><p class="text1"><HOURS>Tuesday - Sunday:
22+
<br/>17:00:00 - 23:00:00</p></HOURS>
2323
</td></tr>
2424
</tbody></table></div></div></div><tr><td></td></tr><div id="sidebar"><div id="sidebar-bgtop"><div id="sidebar-bgbtm"><div id="box-container"><div id="box2">New!</div>
2525
<div id="box1"><ul class="style1"><li class="first active"><a href="special.php">Tikka masala</a></li>

webstruct_data/corpus/ie/annotated/45.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<li id="delivery"><ul><li class="first_item"><strong>Eat In</strong></li>
2121
<li><strong>Take Out</strong></li>
2222
<li>Delivery</li>
23-
</ul><h5><strong>Delivery Hours</strong> from 6PM to 11PM</h5>
23+
</ul><h5><strong>Delivery Hours</strong> <HOURS>from 6PM to 11PM</h5></HOURS>
2424

2525
<p><strong>Delivery Charge</strong> €2.50, Minimum Order €10</p>
2626
</li>

webstruct_data/corpus/ie/annotated/50.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@
1616
<br/><strong>E.</strong> <a href="mailto:bakesncakes@hotmail.com" title="email bake my day">bakesncakes@hotmail.com</a>
1717
</span></span></span></span></p><br/>
1818
<p><span class="st"><span class="fbProfileBylineFragment"><span class="fbProfileBylineLabel"><span><strong>Opening Hours</strong>
19-
<br/>Monday - Friday 8am to 4.30pm</span></span></span></span></p>
19+
<br/><HOURS>Monday - Friday 8am to 4.30pm</span></span></span></span></p>
2020

2121
<p><span class="st"><span class="fbProfileBylineFragment"><span class="fbProfileBylineLabel"><span>Saturday 8am to 5pm
22-
</span></span></span></span></p><br/>
22+
</span></span></span></span></p></HOURS><br/>
2323
</div><div id="left-sidebar-inner-col2"><div id="blockStyle185Main267" class=" ccm-block-styles"><h3>Bake My Day Location</h3>
2424
</div></div><div id="googleMapCanvas185" class="googleMapCanvas"></div><div class="clear"><div id="footer"><div id="footer-inner"><div class="clear">
2525
<p class="footer-copyright">&#169; 2013 Bake My Day | Creative cakes, delicious fresh breads and coffee in Drimnagh, Dublin 12, Ireland T. 01 4264875 E. <a href="mailto:bakesncakes@hotmail.com">bakesncakes@hotmail.com</a></p>

webstruct_data/corpus/ie/annotated/53.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@
207207
<p><a href="https://www.petworld.ie/nationwide-delivery/"><img alt="" src="/images/free-shipping.jpg"/></a></p><p>&#160;</p>
208208

209209
<p>Business Hours</p>
210+
<HOURS>
210211
<table border="0"><tbody><tr><td>
211212
<p><span>Monday</span></p>
212213
</td><td>
@@ -265,7 +266,7 @@
265266
<p><span>6pm</span></p>
266267
</td></tr>
267268
</tbody></table><p>&#160;</p>
268-
269+
</HOURS>
269270
<p>&#160;</p>
270271

271272
<p>&#160;</p>

webstruct_data/corpus/ie/annotated/55.xml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
</td></tr><td></td>
3232
</tbody></table></div></div></td><tr><td colspan="2"><div id="c2546" class="csc-default"><td width="10px"></td></div></td></tr><td valign="top" width="250px"><div id="c2547" class="csc-default"><h1>Opening times</h1>
3333
<table class="contenttable" width="100%"><tbody><tr><td>
34+
<HOURS>
3435
<p class="bodytext">Mon</p>
3536
</td><td>
3637
<p class="bodytext">Closed</p>
@@ -54,6 +55,7 @@
5455
<tr><td><p class="bodytext">Sat</p>
5556
</td><td>
5657
<p class="bodytext">8.30 - 5.30</p>
58+
</HOURS>
5759
</td></tr>
5860
</tbody></table></div></td></tr><div id="c2548" class="csc-default"></div>
5961
</tbody></table></div><div id="c2536" class="csc-default"><h1>Salon Location</h1>

webstruct_data/corpus/ie/annotated/58.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,14 @@
3838
</div><div class="sidebar_full"><h2>Press</h2>
3939
<ul class="list"><div class="testimonial">
4040
<p>&quot;TV3s Alan Hughes &amp; his husband Karl were on hand recently to help Lisa Wallace open her brand-new salon on Dublins South William Street (01) 677 88 22. The salubrious salon exclusively uses the PAYOT skincare range. The salon looks fab &amp; Lisa has to be one of the most passionate women in beauty. We predict good things!&quot; <strong>Mairead NicGidla Phadhraig Beauty News, The Star Chic Weekly</strong></p>
41-
</div></ul></div></div><div class="clear"></div><div class="home_bottom"><div class="left13 fdivider"><h2>Opening <span>Hours</span></h2>
41+
</div></ul></div></div><div class="clear"></div><div class="home_bottom"><div class="left13 fdivider"><h2>Opening <span>Hours</span></h2><HOURS>
4242
<ul class="list1"><li>Monday <span class="op-color">Closed</span></li>
4343
<li>Tuesday <span class="op-color">10am - 8pm</span></li>
4444
<li>Wednesday <span class="op-color">10am - 6pm</span></li>
4545
<li>Thursday <span class="op-color">10am - 8pm</span></li>
4646
<li>Friday <span class="op-color">10am - 6pm</span></li>
4747
<li>Saturday <span class="op-color">10am - 6pm</span></li>
48-
<li>Sunday <span class="op-color">Closed</span></li>
48+
<li>Sunday <span class="op-color">Closed</span></li></HOURS>
4949
</ul></div><div class="left13 fdivider"><h2>From the <span>Blog</span></h2>
5050
<ul class="list"><li><a href="http://www.lisawallace.ie/great-skin-tip-1">Great Skin Tip #1</a></li>
5151
<li><a href="http://www.lisawallace.ie/great-skin-tip-2">Great Skin Tip #2</a></li>

webstruct_data/corpus/ie/annotated/6.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@
4646
</span><br/><span>Email: info@theexchequer.ie
4747
</span><br/><span>www.theexchequer.ie</span></p>
4848

49-
<p><span>Mon to Wed 12pm to 11.30pm
49+
<HOURS><p><span>Mon to Wed 12pm to 11.30pm
5050
</span><br/><span>Thurs 12pm to 12.30am
5151
</span><br/><span>Fri &amp; Sat 12pm to 2.30am
52-
</span><br/><span>Sun 12pm to 11pm</span></p>
52+
</span><br/><span>Sun 12pm to 11pm</span></p></HOURS>
5353
</div></div></div></article><div class="hr"><hr/></div><footer><div id="footer_top"><div class="outer bottom"><div class="middle"><div class="inner"><div class="container"><div class="col col_odd"><h3>Navigation</h3>
5454
<ul class="left"><li><a href="http://www.theexchequer.ie/&lt;br /&gt;
5555
&lt;b&gt;Notice&lt;/b&gt;: Undefined variable: access in &lt;b&gt;/home/theexche/public_html/includes/footer.php&lt;/b&gt; on line &lt;b&gt;19&lt;/b&gt;&lt;br /&gt;">Home</a></li>
@@ -71,10 +71,10 @@
7171
<br/>email: <a href="mailto:info@theexchequer.ie">info@theexchequer.ie</a>
7272
<br/>www.theexchequer.ie</p>
7373
</div><div class="col col_odd"><h3>Opening Hours</h3>
74-
<ul><li>Mon to Wed, Midday to 11.30pm</li>
74+
<HOURS><ul><li>Mon to Wed, Midday to 11.30pm</li>
7575
<li>Thurs, Midday to 12.30am</li>
7676
<li>Fri &amp; Sat, Midday to 2.30am</li>
77-
<li>Sun, Midday to 11pm</li>
77+
<li>Sun, Midday to 11pm</li></HOURS>
7878
</ul></div><div class="col col_even last_col"><div id="book_table_footer"><a class="book_table book_table_foot" href="book_table.php">Book A Table Now</a></div>
7979
</div></div></div></div></div><div class="logos"><div class="logo_container"><img class="left" alt="" src="http://www.theexchequer.ie/images/logo_restaurants_association.png"/><div class="logo_container"><img class="right" alt="" src="http://www.theexchequer.ie/images/logo_the_gathering.png"/><div class="clear"></div></div></div></div><div id="footer_bottom"><div class="outer"><div class="middle"><div class="container">
8080
<p class="left">Designed &amp; Produced by <a href="http://www.rocketbug.com/2.0/">rocketbug</a></p>

0 commit comments

Comments
 (0)