-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Copying here some discussion from slack around a problem we were seeing at Stanford that is related to the naming of this field.
The proposal: rename dct_references_s to dct_references_ss.
we noticed locally at stanford that sometimes you could get really weird search results in earthworks, like thousands and thousands of results that didn't seem to have any shared text with your query
we determined that the problem was that the dct_references_s field was actually getting indexed and used for searching. which meant that certain small strings or terms appeared on huge portions of records
like, every record with a WMS reference would have the references key http://www.opengis.net/def/serviceType/ogc/wms . which meant that if you searched something like "service", which appears in that text, you would get every record with a WMS link
just checked and i think this is actually GBL's default behavior, because the field name ends in an _s
one very easy fix, though, is to just use a different dynamic field type: if you use _ss , so the field is dct_references_ss, it is stored (and can be used by GBL) but not indexed, so you don't search on it
i just now tried a very basic find/replace for every instance of dct_references_s to dct_references_ss in core geoblacklight (including both the solr schema and all the fixture data), reindexed, and it seemed to confirm my suspicions
if you try searching for something like "ogc" or "csdgm" that's part of one of the URIs, you get a bunch of results. when you change to dct_references_ss, you don't
See also: the field definition in question in GBL.