While running baseline update in Endeca, getting following error –

[09.24.20 13:02:06] SEVERE: Crawl ‘MyAppen-last-mile-crawl’ failed with error: Problem running full acquisition on data source for MyAppen-last-mile-crawl: Error reading from Record Store MyAppen-data: /data/apps/endeca/CAS/workspace/state/MyAppen-data/data/storage/generations/generation-0000000162 (No such file or directory).
There is another symptom in cas-service.log –

2020-09-24 15:13:40,039 INFO [MyAppen-data] [MyAppen-data-cleaner-1] com.endeca.itl.recordstore.impl.storage.RecordStorageFileMergeCursor: Merge cursor skipped 0 entries with duplicate ids out of 0 entries

2020-09-24 15:13:40,039 WARN [MyAppen-data] [MyAppen-data-cleaner-1] com.endeca.itl.recordstore.impl.RecordStoreImpl: Exception when cleaning generations older than 185

2020-09-24 15:13:40,039 WARN [MyAppen-data] [MyAppen-data-cleaner-1] com.endeca.itl.recordstore.impl.Cleaner: Exception caught while performing cleanup

com.endeca.itl.recordstore.RecordStoreException: /data/apps/endeca/CAS/workspace/state/MyAppen-data/data/storage/generations/generation-0000000162 (No such file or directory)

at com.endeca.itl.recordstore.impl.RecordStoreImpl.clean(RecordStoreImpl.java:640)

at com.endeca.itl.recordstore.impl.Cleaner$1.run(Cleaner.java:77)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

at com.endeca.itl.util.LoggingContextAwareThread.run(LoggingContextAwareThread.java:73)

Caused by: java.io.FileNotFoundException: /data/apps/endeca/CAS/workspace/state/MyAppen-data/data/storage/generations/generation-0000000162 (No such file or directory)

at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.<init>(FileInputStream.java:138)

at com.endeca.itl.recordstore.impl.storage.RecordStorageEntryReader.<init>(RecordStorageEntryReader.java:63)

at com.endeca.itl.recordstore.impl.storage.RecordStorageFileMergeCursor$MergeBuffer.<init>(RecordStorageFileMergeCursor.java:155)

at com.endeca.itl.recordstore.impl.storage.RecordStorageFileMergeCursor.<init>(RecordStorageFileMergeCursor.java:98)

at com.endeca.itl.recordstore.impl.storage.RecordStorageFileMergeCursor.create(RecordStorageFileMergeCursor.java:70)

at com.endeca.itl.recordstore.impl.storage.GenerationFileManager.openMergeCursor(GenerationFileManager.java:176)

at com.endeca.itl.recordstore.impl.storage.CleanOperation.createNewSurvivorGenerationCursor(CleanOperation.java:142)

at com.endeca.itl.recordstore.impl.storage.CleanOperation.prepareNewIndex(CleanOperation.java:95)

at com.endeca.itl.recordstore.impl.storage.RecordStorageManagerImpl.clean(RecordStorageManagerImpl.java:292)

at com.endeca.itl.recordstore.impl.RecordStoreImpl.clean(RecordStoreImpl.java:637)

… 9 more

Why this happened ?

This happens when we delete the cas generation logs from <CAS-installation>/workspace/state/MyAppen-data/storage/generations. Generally this is deleted when we face space constraints on the filesystem. The CAS runs a cleanup after every ‘x’ hours (configured in record-store configuration). This clean-up looks at old generation data files as well and when it does not find those generation files, the cleanup fails and also the subsequent baseline updates.

How to fix?

We need to tell the CAS record-store to ignore the older generations and just focus on the most recent ones.

  1. Export the configuration of recordstore – <CAS-installation>/11.3/bin/recordstore-cmd.sh get-configuration -a MyAppen-data -f /tmp/MyAppendataConfig.xml
  2. View the configuration and look for something like this <generationRetentionTime>144.0</generationRetentionTime>.Here generation retention is 144 hours or 6 days. If not specified, default value is 168 hours or 1 week.
  3. If you deleted the generations older than say 3 days, set this value to 72 hours or 3 days – <generationRetentionTime>144.0</generationRetentionTime> :
    <?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>
    <recordStoreConfiguration xmlns=”http://recordstore.itl.endeca.com/”>
    <changePropertyNames/>
    <idPropertyName>record.id</idPropertyName>
        <cleanerInterval>1</cleanerInterval>
      <generationRetentionTime>72.0</generationRetentionTime>
    <jdbmSettings/>
    </recordStoreConfiguration>
    * cleanerInterval tells how frequent clean up runs for that recordStore
  4. Now import/set back the updated configuration to the record-store : <CAS-installation>/11.3/bin/recordstore-cmd.sh set-configuration -a MyAppen-data -f /tmp/MyAppendataConfig.xml
  5. Wait for cleanup to automatically run in 1 hour and then run the baseline update.

 

Cheers,
Mayank Batra

Leave a Reply