Removing latest transactions from Alfresco Search Services

cancel
Showing results for 
Search instead for 
Did you mean: 

Removing latest transactions from Alfresco Search Services

angelborroy
Alfresco Employee
0 0 3,617

This blog post describes the procedure to remove a number of transactions from Alfresco SOLR Index so they can be re-indexed from scratch. This procedure could help with upgrading and re-indexing scenarios.

Alfresco SOLR cores (alfresco, archive) include different DOC_TYPE documents:

  • Node: documents with properties, content and permissions information
  • Acl: details of document permissions
  • Tx: details of transactions
  • AclTx: details of permission changes
  • State: current status of indexing process

Details on the number of different document types can be obtained by using following URL:

http://localhost:8983/solr/#/alfresco/schema?field=DOC_TYPE

 

Screenshot 2022-06-21 at 11.20.41.png

Following steps are required to remove and re-index latest transactions in SOLR Core:

  • PURGE desired TX Ids from SOLR Core using SOLR Admin REST API
  • Get transaction properties for desired transaction (Id 40 in the sample below)
  • Disable SOLR tracking
  • Update (technically is "add") TRACKER!STATE!TX Solr Document with properties of desired transaction
  • Enable SOLR tracking
  • Verify pending transactions are indexed

Properties for the document type "State" include latest indexed transaction and permission list:

http://localhost:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q={!term%20f=DOC_TYPE}State&wt=js...

[
      {
        "id":"TRACKER!STATE!ACLTX",
        "_version_":1736233537222213632,
        "S_ACLTXID":9,
        "S_INACLTXID":9,
        "S_ACLTXCOMMITTIME":1655799514574,
        "DOC_TYPE":"State",
        "LAST_INCOMING_CONTENT_VERSION_ID":-10},
      {
        "id":"TRACKER!STATE!TX",
        "_version_":1736234575631220736,
        "S_TXID":43,
        "S_INTXID":43,
        "S_TXCOMMITTIME":1655799968144,
        "DOC_TYPE":"State",
        "LAST_INCOMING_CONTENT_VERSION_ID":-10
     }
]

In the sample above, for alfresco core, latest indexed transaction is 43 with a commit time 1655799968144

In order to remove transactions 41, 42 and 43 from SOLR Core, get properties for transaction 40.

http://localhost:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DOC_TYPE:%22Tx%22%20AND%20TXID:...

{
    "id":"TRACKER!TX!8000000000000028",
    "_version_":1736233557160886272,
    "TXID":40,
    "INTXID":40,
    "TXCOMMITTIME":1655799960307,
    "DOC_TYPE":"Tx",
    "int@s_@cascade":0,
    "LAST_INCOMING_CONTENT_VERSION_ID":-10}]
}

Remove the transactions from SOLR Core using the SOLR Admin REST API:

$ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=43'

$ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=42'

$ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=41'

This operation may take a while, verify the latest transaction in SOLR Core is the expected one (40 in this example) before moving forward.

http://localhost:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DOC_TYPE:%22Tx%22&rows=1&sort=T...

{
        "id":"TRACKER!TX!8000000000000028",
        "_version_":1736233557160886272,
        "TXID":40,
        "INTXID":40,
        "TXCOMMITTIME":1655799960307,
        "DOC_TYPE":"Tx",
        "int@s_@cascade":0,
        "LAST_INCOMING_CONTENT_VERSION_ID":-10}]
  }

Once the transactions have been removed from SOLR Core, the status document TRACKER!STATE!TX needs to be modified. Before performing this udpate, stop Alfresco Search Services and include following configuration in solrcore.properties to disable tracking process. You need to set this property in both cores: alfresco, archive.

enable.alfresco.tracking=false

Once Alfresco Search Services is up & running again, use the following command to update the status with the properties of transaction 40

$ curl --location --request POST \
'http://localhost:8983/solr/alfresco/update?commitWithin=1000&overwrite=true&wt=json' \
--header 'Content-Type: application/json' \
--data-raw '[
    {
        "id":"TRACKER!STATE!TX",
        "_version_":1,
        "S_TXID":40,
        "S_INTXID":40,
        "S_TXCOMMITTIME":1655799960307,
        "DOC_TYPE":"State",
        "LAST_INCOMING_CONTENT_VERSION_ID":-10
    }
]'

Stop Alfresco Search Services again and revert previous configuration in solrcore.properties files

enable.alfresco.tracking=true

Once Alfresco Search Services is up & running, transactions from Id 40 will be indexed on the regular tracking process. After a while, latest transaction can be verified as 43 in both TX and TRACKER!STATE!TX documents.

http://localhost:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q={!term%20f=DOC_TYPE}State&wt=js...

{
        "id":"TRACKER!STATE!TX",
        "_version_":1736237510552453120,
        "S_TXID":43,
        "S_INTXID":43,
        "S_TXCOMMITTIME":1655799968144,
        "DOC_TYPE":"State",
        "LAST_INCOMING_CONTENT_VERSION_ID":-10
}

http://localhost:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DOC_TYPE:%22Tx%22&rows=1&sort=T...

{
        "id":"TRACKER!TX!800000000000002b",
        "_version_":1736237530581303296,
        "TXID":43,
        "INTXID":43,
        "TXCOMMITTIME":1655799968144,
        "DOC_TYPE":"Tx",
        "int@s_@cascade":0,
        "LAST_INCOMING_CONTENT_VERSION_ID":-10
}

 

Additional notes

An alternative approach to disable indexing, contributed by @morganp1, is the use of SOLR REST API "disable indexing" action:

http://localhost:8983/solr/admin/cores?action=disable-indexing

<response>
<lst name="action">
  <lst name="alfresco">
   <bool name="CASCADE">false</bool>
   <bool name="CONTENT">false</bool>
   <bool name="ACL">false</bool>
   <bool name="METADATA">false</bool>
 </lst>
 <lst name="archive">
   <bool name="CASCADE">false</bool>
   <bool name="CONTENT">false</bool>
   <bool name="ACL">false</bool>
   <bool name="METADATA">false</bool>
 </lst>
</lst>
</response>

This operation doesn't require re-starting the SOLR Server, that may be recommended for some use cases.

In order to restore the indexing process again, use the action in the opposite way:

http://localhost:8983/solr/admin/cores?action=enable-indexing

<response>
</lst>
<lst name="action">
  <lst name="alfresco">
   <bool name="CASCADE">true</bool>
   <bool name="CONTENT">true</bool>
   <bool name="ACL">true</bool>
   <bool name="METADATA">true</bool>
 </lst>
 <lst name="archive">
   <bool name="CASCADE">true</bool>
   <bool name="CONTENT">true</bool>
   <bool name="ACL">true</bool>
   <bool name="METADATA">true</bool>
 </lst>
</lst>
</response>

 

About the Author
Angel Borroy is Hyland Developer Evangelist. Over the last 15 years, he has been working as a software architect on Java, BPM, document management and electronic signatures. He has been working with Alfresco during the last years to customize several implementations in large organizations and to provide add-ons to the Community based on Record Management and Electronic Signature. He writes (sometimes) on his personal blog http://angelborroy.wordpress.com. He is (proud) member of the Order of the Bee.