Skip to content

Proposal: rename the dct_references_s field to avoid indexing issues in Solr #68

@thatbudakguy

Description

@thatbudakguy

Copying here some discussion from slack around a problem we were seeing at Stanford that is related to the naming of this field.

The proposal: rename dct_references_s to dct_references_ss.

we noticed locally at stanford that sometimes you could get really weird search results in earthworks, like thousands and thousands of results that didn't seem to have any shared text with your query

we determined that the problem was that the dct_references_s field was actually getting indexed and used for searching. which meant that certain small strings or terms appeared on huge portions of records

like, every record with a WMS reference would have the references key http://www.opengis.net/def/serviceType/ogc/wms . which meant that if you searched something like "service", which appears in that text, you would get every record with a WMS link

just checked and i think this is actually GBL's default behavior, because the field name ends in an _s

one very easy fix, though, is to just use a different dynamic field type: if you use _ss , so the field is dct_references_ss, it is stored (and can be used by GBL) but not indexed, so you don't search on it

i just now tried a very basic find/replace for every instance of dct_references_s to dct_references_ss in core geoblacklight (including both the solr schema and all the fixture data), reindexed, and it seemed to confirm my suspicions

if you try searching for something like "ogc" or "csdgm" that's part of one of the URIs, you get a bunch of results. when you change to dct_references_ss, you don't

See also: the field definition in question in GBL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions