Anomalies dans la recherche

cancel
Showing results for 
Search instead for 
Did you mean: 
brunotony
Member II

Anomalies dans la recherche

Bonjour à toutes et à tous,
Je m'aperçois (grâce à nos utilisateurs) que la recherche (en mode avancé) ne donne pas les résultats escomptés ex : un document test contient le mot données  le mot charleroi et le mot dont
Lorsque je recherche à l'identique le mot "dont" grâce à la syntaxe "don" ou =don, le résultat renvoie mon document or celui-ci ne contient pas ce mot, par contre si dans le même document le mot données devient donnees (sans l'accent aigu), si je recherche à l'identique le mot don alors  alfresco ne renvoie pas de résultat ce qui est logique.
Autre anomalie, toujours le même document, lorsque je recherche à l'identique en mode avancé le mot "dont" ou =dont, (je rappelle que le document contient ce mot), alfresco ne renvoie aucun résultat et pourtant le mot est bien présent.

L'alfresco en production est la  Version: Enterprise - v3.1.1 (229)

Voici la config lucène contenu dans le fichier repository.properties:

# Repository configuration

repository.name=Main Repository

# Directory configuration

dir.root=./alf_data

dir.contentstore=${dir.root}/contentstore
dir.contentstore.deleted=${dir.root}/contentstore.deleted

dir.auditcontentstore=${dir.root}/audit.contentstore

# The location for lucene index files
dir.indexes=${dir.root}/lucene-indexes

# The location for index backups
dir.indexes.backup=${dir.root}/backup-lucene-indexes

# The location for lucene index locks
dir.indexes.lock=${dir.indexes}/locks

# ######################################### #
# Index Recovery and Tracking Configuration #
# ######################################### #
#
# Recovery types are:
#    NONE:     Ignore
#    VALIDATE: Checks that the first and last transaction for each store is represented in the indexes
#    AUTO:     Validates and auto-recovers if validation fails
#    FULL:     Full index rebuild, processing all transactions in order.  The server is temporarily suspended.
index.recovery.mode=VALIDATE
# FULL recovery continues when encountering errors
index.recovery.stopOnError=false
index.recovery.maximumPoolSize=5
# Set the frequency with which the index tracking is triggered.
# For more information on index tracking in a cluster:
#    http://wiki.alfresco.com/wiki/High_Availability_Configuration_V1.4_to_V2.1#Version_1.4.5.2C_2.1.1_an...
# By default, this is effectively never, but can be modified as required.
#    Examples:
#       Never:                   * * * * * ? 2099
#       Once every five seconds: 0/5 * * * * ?
#       Once every two seconds : 0/2 * * * * ?
#       See http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
index.tracking.cronExpression=0/5 * * * * ?
index.tracking.adm.cronExpression=${index.tracking.cronExpression}
index.tracking.avm.cronExpression=${index.tracking.cronExpression}
# Other properties.
index.tracking.maxTxnDurationMinutes=10
index.tracking.reindexLagMs=1000
index.tracking.maxRecordSetSize=1000
index.tracking.maxTransactionsPerLuceneCommit=100
index.tracking.disableInTransactionIndexing=false
# Index tracking information of a certain age is cleaned out by a scheduled job.
# Any clustered system that has been offline for longer than this period will need to be seeded
# with a more recent backup of the Lucene indexes or the indexes will have to be fully rebuilt.
# Use -1 to disable purging.  This can be switched on at any stage.
index.tracking.minRecordPurgeAgeDays=30

# Change the failure behaviour of the configuration checker
system.bootstrap.config_check.strict=true

# The name of the cluster
#   Leave this empty to disable cluster entry
alfresco.cluster.name=

# JGroups configuration (http://www.jgroups.org)
# The location of the JGroups configuration file
#   It is also possible to override this by just dropping a file in classpath:alfresco/extension/jgroups-custom.xml
alfresco.jgroups.configLocation=classpath:alfresco/jgroups-default.xml
# The protocol stack to use from the JGroups configuration file
#   The JGroups configuration files are divided into protocol stacks.
#   Use this property to select which communication method should be used.
alfresco.jgroups.defaultProtocol=UDP

#
# How long should shutdown wait to complete normally before
# taking stronger action and calling System.exit()
# in ms, 10,000 is 10 seconds
#
shutdown.backstop.timeout=10000
shutdown.backstop.enabled=true

# Server Single User Mode
# note:
#   only allow named user (note: if blank or not set then will allow all users)
#   assuming maxusers is not set to 0
#server.singleuseronly.name=admin

# Server Max Users - limit number of users with non-expired tickets
# note:
#   -1 allows any number of users, assuming not in single-user mode
#   0 prevents further logins, including the ability to enter single-user mode
server.maxusers=-1

# The Cron expression controlling the frequency with which the OpenOffice connection is tested
openOffice.test.cronExpression=0 * * * * ?

#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000

#
# Manually control how the system handles maximum string lengths.
# Any zero or negative value is ignored.
# Only change this after consulting support or reading the appropriate Javadocs for
# org.alfresco.repo.domain.schema.SchemaBootstrap for V2.1.2
system.maximumStringLength=-1

#
# Limit hibernate session size by trying to amalgamate events for the L2 session invalidation
# - hibernate works as is up to this size
# - after the limit is hit events that can be grouped invalidate the L2 cache by type and not instance
# events may not group if there are post action listener registered (this is not the case with the default distribution)
system.hibernateMaxExecutions=20000

#
# Determine if document deletion and archival must cascade delete in the same
# transaction that triggers the operation.
system.cascadeDeleteInTransaction=true

#
# Determine if modification timestamp propagation from child to parent nodes is respected or not.
# Even if 'true', the functionality is only supported for child associations that declare the
# 'propagateTimestamps' element in the dictionary definition.
system.enableTimestampPropagation=false

# #################### #
# Lucene configuration #
# #################### #
#
# Millisecond threshold for text transformations
# Slower transformers will force the text extraction to be asynchronous
#
lucene.maxAtomicTransformationTime=20
#
# The maximum number of clauses that are allowed in a lucene query
#
lucene.query.maxClauses=10000
#
# The size of the queue of nodes waiting for index
# Events are generated as nodes are changed, this is the maximum size of the queue used to coalesce event
# When this size is reached the lists of nodes will be indexed
#
# http://issues.alfresco.com/browse/AR-1280:  Setting this high is the workaround as of 1.4.3.
#
lucene.indexer.batchSize=1000000
#
# Lucene index min merge docs - the in memory size of the index
#
lucene.indexer.minMergeDocs=1000
#
# When lucene index files are merged together - it will try to keep this number of segments/files in 
#
lucene.indexer.mergeFactor=10
#
# Roughly the maximum number of nodes indexed in one file/segment
#
lucene.indexer.maxMergeDocs=100000
#
# The number of terms from a document that will be indexed
#
lucene.indexer.maxFieldLength=10000

lucene.write.lock.timeout=10000
lucene.commit.lock.timeout=100000
lucene.lock.poll.interval=100

# Database configuration
db.schema.stopAfterSchemaBootstrap=false
db.schema.update=true
db.schema.update.lockRetryCount=24
db.schema.update.lockRetryWaitSeconds=5
db.driver=org.gjt.mm.mysql.Driver
db.name=alfresco
db.url=jdbc:mysql:///${db.name}
db.username=alfresco
db.password=alfresco
db.pool.initial=10
db.pool.max=40
db.txn.isolation=-1

# Email configuration
mail.host=vdcexchange
mail.port=25
mail.username=bruno
mail.password=**********
# Set this value to UTF-8 or similar for encoding of email messages as required
mail.encoding=UTF-8
# Set this value to 7bit or similar for Asian encoding of email headers as required
mail.header=
mail.from.default=alfresco@alfresco.org

# System Configuration
system.store=system://system
system.descriptor.childname=sys:descriptor
system.descriptor.current.childname=sys:descriptor-current

# User config
alfresco_user_store.store=user://alfrescoUserStore
alfresco_user_store.system_container.childname=sys:system
alfresco_user_store.user_container.childname=sys:people
alfresco_user_store.authorities_container.childname=sys:authorities

# note: default admin username - should not be changed after installation
alfresco_user_store.adminusername=admin

# note: default guest username - should not be changed after installation
alfresco_user_store.guestusername=guest

# Spaces Archive Configuration
spaces.archive.store=archive://SpacesStore

# Spaces Configuration
spaces.store=workspace://SpacesStore
spaces.company_home.childname=app:company_home
spaces.guest_home.childname=app:guest_home
spaces.dictionary.childname=app:dictionary
spaces.templates.childname=app:space_templates
spaces.templates.content.childname=app:content_templates
spaces.templates.email.childname=app:email_templates
spaces.templates.rss.childname=app:rss_templates
spaces.savedsearches.childname=app:saved_searches
spaces.scripts.childname=app:scripts
spaces.wcm.childname=app:wcm
spaces.wcm_content_forms.childname=app:wcm_forms
spaces.content_forms.childname=app:forms
spaces.user_homes.childname=app:user_homes
spaces.sites.childname=st:sites
spaces.templates.email.invite.childname=cm:invite

# ADM VersionStore Configuration
version.store.deprecated.lightWeightVersionStore=workspace://lightWeightVersionStore
version.store.version2Store=workspace://version2Store
# WARNING: For non-production testing only !!! Do not change (to avoid version store issues, including possible mismatch). Should be false since lightWeightVersionStore is deprecated.
version.store.onlyUseDeprecatedV1=false

# Folders for storing people
system.system_container.childname=sys:system
system.people_container.childname=sys:people

# Folders for storing workflow related info
system.workflow_container.childname=sys:workflow

# Are user names case sensitive?
user.name.caseSensitive=false
domain.name.caseSensitive=false
domain.separator=

# AVM Specific properties.
avm.remote.idlestream.timeout=30000

# ################################## #
# WCM Link Validation Configuration  #
# ################################## #
#
# Note: Link Validation is disabled by default (as per poll interval = 0)
#
# linkvalidation.pollInterval  - Poll interval to check getLatestSnapshotID (in milliseconds), eg. 5000 for 5 sec interval
#                           If pollInterval is 0, link validation is disabled.
#
# linkvalidation.retryInterval - Retry interval (Virtualization server is not accessible or an error has occurred
#                          during link validation.
#
# linkvalidation.disableOnFail - If set to TRUE link validation service will be terminated if an error will be occurred.

linkvalidation.pollInterval=0
linkvalidation.retryInterval=120000
linkvalidation.disableOnFail=false

# ECM content usages/quotas
system.usages.enabled=true

# Repository endpoint - used by Activity Service
repo.remote.endpoint.url=http://localhost:8080/alfresco/service

# Create home folders as people are created (true) or create them lazily (false)
home.folder.creation.eager=true

# The well known RMI registry port is defined in the alfresco-shared.properties file
# alfresco.rmi.services.port=50500
#
# RMI service ports for the individual services.
# These six services are available remotely.
#
# Assign individual ports for each service for best performance
# or run several services on the same port, you can even run everything on 50500 if
# running through a firewall.
#
# Specify 0 to use a random unused port.
#
avm.rmi.service.port=50501
avmsync.rmi.service.port=50502
attribute.rmi.service.port=50503
authentication.rmi.service.port=50504
repo.rmi.service.port=50505
action.rmi.service.port=50506

# External executable locations
ooo.exe=soffice
ooo.user=${dir.root}/oouser
img.root=./ImageMagick
img.dyn=${img.root}/lib
img.exe=${img.root}/bin/convert
swf.exe=./bin/pdf2swf

# Property to enable upgrade from 2.1-A
V2.1-A.fixes.to.schema=0
#V2.1-A.fixes.to.schema=82
D'avance merci pour votre aide.

Tony BRUNO
7 Replies
brunotony
Member II

Re: Anomalies dans la recherche

Je reviens à mon précédent message.
Après avoir réalisé une autre batterie de tests, il s'avère que la recherche fonctionne comme suit :
Se trouvent dans alfresco un document avec le mot "dont" uniquement et un autre document avec la phrase "dont la voiture", il s'avère que lorsque je recherche le mot "dont" à l'identique je n'obtient aucun résultat par contre si je recherche "dont la voiture" alors j'obtiens non seulement le document contenant cette expression mais également le document ne contenant que le mot dont.

Je ne comprend pas du tout cette logique.  Quelqu'un pourrait-il m'apporter une idée de réponse.

D'avance merci

Tony BRUNO
dgenard
Active Member

Re: Anomalies dans la recherche

Bonjour,

si les documents contenus dans Alfresco sont en français, un indexeur Lucene spécifique au français donne souvent des résultats de recherche plus intuitifs.
Pour cela, changer les lignes suivantes dans dataTypeAnalyzers.properties, dataTypeAnalyzers_fr.properties voire dataTypeAnalyzers_en.properties :
d_dictionary.datatype.d_text.analyzer=org.alfresco.repo.search.impl.lucene.analysis.FrenchSnowballAnalyserThatRemovesAccents
d_dictionary.datatype.d_content.analyzer=org.alfresco.repo.search.impl.lucene.analysis.FrenchSnowballAnalyserThatRemovesAccents
Ceci donnera une recherche insensible aux majuscules, accents et dérivés grammaticaux.
Réindexer ensuite l'entrepôt.

Voir aussi http://forums.alfresco.com/fr/viewtopic.php?f=8&t=4192&p=19367&hilit=FrenchSnowball#p19367

Denis
jayjayecl
Active Member II

Re: Anomalies dans la recherche

Bonjour,

Merci d'utiliser la balise "CODE", notamment lorsqu'il s'agit de citer 140 lignes d'un fichier de config. Smiley Happy
(1er message édité)
brunotony
Member II

Re: Anomalies dans la recherche

Un grand merci Denis, j'ai essayé mais à 40 d'indexation problème mémoire (malgrés 4 Go RAM) Heap (out of memory).  Vais investiguer sur gestion mémoire

Encore un grand merci

Tony
rguinot
Customer

Re: Anomalies dans la recherche

mais à 40 d'indexation problème mémoire (malgrés 4 Go RAM) Heap (out of memory).

postez l'exception, stack complète,  sinon on ne la devinera pas ..

en effet relire la doc /wiki / lucene sur le stemming, tokenization, stop words, etc … http://wiki.alfresco.com/wiki/Search
dranakan
Active Member

Re: Anomalies dans la recherche

Bonjour,

Est-ce possible de mettre ce fichier dans /tomcat/shared/classes/alfresco/extension… ou mettre le contenu dans alfresco-global.properties ?
jayjayecl
Active Member II

Re: Anomalies dans la recherche

les fichiers dataTypeAnalyzers.properties ? Par défaut non, il faudrait pour cela "trafiquer" un peu les fichiers de contexte Spring d'imports (creuser un peu le fichier core-services-context.xml pour bien comprendre comment cela fonctionne).