The importance of QueryRegexEscapeOptions
Today I upgraded Koha to version 21.05, but not without a curious indexing error. At the same time I upgraded Koha, I also upgraded Elasticsearch to the latest 6.x version, 6.8.20.
The Problem
After running apt-get update && apt-get upgrade in Ubuntu 18.04, which upgraded Elasticsearch and Koha at the same time, I noticed that after searching for particular records in both the Opac and staff interface, I was getting Error 500s instead of the details of the bibliographic record you would expect from the opac-details page. This only affected a subset of my bibliographic records, of which there are 327.
Example



The problem seems to be related to the AACR2r punctuation in field 245 subfield $b, or the “/” not being escaped.
I checked my log files by running
tail -f /var/log/koha/$INSTANCE/*.log
and I noticed the following:
==> /var/log/koha/library2/plack-opac-error.log <==
[2021/10/31 13:09:28] [WARN] [Request] ** [http://localhost:9200]-[400] [query_shard_exception] Failed to parse query [(host-item:(The* finer* points* of* sausage* dogs* /))], with: {"index":"koha_library2_biblios","index_uuid":"eSiLbWAnRqO0CR7sIjRfDw"}, called from sub Search::Elasticsearch::Role::Client::Direct::__ANON__ at /usr/share/koha/lib/Koha/SearchEngine/Elasticsearch/Search.pm line 96. With vars: {'body' => {'error' => {'type' => 'search_phase_execution_exception','grouped' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'failed_shards' => [{'index' => 'koha_library2_biblios','node' => 'ZCj2zlZpSueOWf0foNWidQ','reason' => {'reason' => 'Failed to parse query [(host-item:(The* finer* points* of* sausage* dogs* /))]','index' => 'koha_library2_biblios','type' => 'query_shard_exception','caused_by' => {'caused_by' => {'reason' => 'Lexical error at line 1, column 55. Encountered: <EOF> after : "/))"','type' => 'token_mgr_error'},'type' => 'parse_exception','reason' => 'Cannot parse \'(host-item:(The* finer* points* of* sausage* dogs* /))\': Lexical error at line 1, column 55. Encountered: <EOF> after : "/))"'},'index_uuid' => 'eSiLbWAnRqO0CR7sIjRfDw'},'shard' => 0}],'root_cause' => [{'reason' => 'Failed to parse query [(host-item:(The* finer* points* of* sausage* dogs* /))]','type' => 'query_shard_exception','index_uuid' => 'eSiLbWAnRqO0CR7sIjRfDw','index' => 'koha_library2_biblios'}],'reason' => 'all shards failed','phase' => 'query'},'status' => 400},'status_code' => 400,'request' => {'mime_type' => 'application/json','path' => '/koha_library2_biblios/_search','qs' => {},'serialize' => 'std','ignore' => [],'method' => 'GET','body' => {'size' => 0,'aggregations' => {'author' => {'terms' => {'size' => '20','field' => 'author__facet'}},'location' => {'terms' => {'size' => '20','field' => 'location__facet'}},'holdingbranch' => {'terms' => {'field' => 'holdingbranch__facet','size' => '20'}},'ccode' => {'terms' => {'size' => '20','field' => 'ccode__facet'}},'title-series' => {'terms' => {'field' => 'title-series__facet','size' => '20'}},'itype' => {'terms' => {'size' => '20','field' => 'itype__facet'}},'subject' => {'terms' => {'field' => 'subject__facet','size' => '20'}},'su-geo' => {'terms' => {'size' => '20','field' => 'su-geo__facet'}},'ln' => {'terms' => {'size' => '20','field' => 'ln__facet'}}},'from' => 0,'query' => {'query_string' => {'default_operator' => 'AND','fields' => ['author','subject','title-later','ln-audio','title-expanded','number-natl-biblio','date-time-last-modified','notforloan','damaged','udc-classification','datelastseen','curriculum','control-number','identifier-publisher-for-music','related-periodical','lexile-number','holdingbranch','material-type','host-item','subject-name-personal','ff7-01-02','ccode','language-original','note','thematic-number','totalissues','title','music-key','nlm-call-number','editor','date-of-publication','cn-prefix','code-geographic','host-item-number','title-former','rtype','issues','date-of-acquisition','materials-specified','cn-suffix','copydate','cn-bib-source','personal-name','ln-subtitle','llength','not-onloan-count','ff8-23','location','ff7-00','koha-auth-number','title-key','stack','microform-generation','lf','geographic-class','number-govt-pub','publisher','number-legal-deposit','bgf-number','author-title','local-classification','stock-number','cn-class','name-geographic','conference-name','record-source','lc-card-number','author-name-corporate','dewey-classification','local-number','replacementprice','number-db','interest-age-level','renewals','itemnumber','record-control-number','interest-grade-level','identifier-other','ctype','cross-reference','replacementpricedate','dissertation-information','title-abbreviated','cn-sort','title-series','other-control-number','name','issn','price','code-institution','biblioitemnumber','abstract','classification-source','arl','su-geo','index-term-genre','provider','withdrawn','bio','extent','coded-location-qualifier','isbn','reserves','ta','index-term-uncontrolled','author-in-order','title-cover','date-entered-on-file','title-collective','corporate-name','uri','name-and-title','nal-call-number','report-number','identifier-standard','reading-grade-level','itemtype','author-name-personal','ff7-01','homebranch','pl','coden','ff8-29','indexed-by','lc-call-number','copynumber','bib-level','acqsource','ff7-02','map-scale','bnb-card-number','lost','barcode','restricted','datelastborrowed','number-local-acquisition','ln','cn-item','author-personal-bibliography','title-uniform','cn-bib-sort','arp','title-other-variant','itype'],'fuzziness' => 'auto','lenient' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),'type' => 'cross_fields','analyze_wildcard' => $VAR1->{'request'}{'body'}{'query'}{'query_string'}{'lenient'},'query' => '(host-item:(The* finer* points* of* sausage* dogs* /))'}}}}}
I then did a Google search to see what I could find and came across an October, 2021 thread https://lists.katipo.co.nz/public/koha/2021-October/056849.html. It mentioned that Elasticsearch sometimes threw an error 500 when certain characters are present in bib records because they weren’t being escaped.
The Fix
I tried reindexing my bibs and authorities in Elasticsearch. No effect.
I tried restarting memcached, plack, koha, apache. No effect
I tried switiching to Zebra. This removed the error 500s, and confirmed that the problem was with Elasticsearch. This did not fix anything though.
What did work was the following:
There is system preference in Koha called QueryRegexEscapeOptions
, which looks like this:

QueryRegexEscapeOptions
system preference in Koha 21.05I initially had my value set to “Unescape escaped”. I changed this to “Escape” and the error 500s disappeared!