C# Класс Lucene.Net.Index.LiveIndexWriterConfig

Holds all the configuration used by IndexWriter with few setters for settings that can be changed on an IndexWriter instance "live". @since 4.0
Показать файл Открыть проект Примеры использования класса

Защищенные свойства (Protected)

Свойство Тип Описание
Commit IndexCommit
MatchVersion Version
PerThreadHardLimitMB int
checkIntegrityAtMerge bool
codec System.Codec
delPolicy IndexDeletionPolicy
flushPolicy Lucene.Net.Index.FlushPolicy
indexerThreadPool Lucene.Net.Index.DocumentsWriterPerThreadPool
indexingChain IndexingChain
infoStream Lucene.Net.Util.InfoStream
mergePolicy MergePolicy
mergeScheduler Lucene.Net.Index.MergeScheduler
openMode OpenMode_e?
readerPooling bool
similarity Similarity
useCompoundFile bool
writeLockTimeout long

Открытые методы

Метод Описание
SetCheckIntegrityAtMerge ( bool checkIntegrityAtMerge ) : LiveIndexWriterConfig

Sets if IndexWriter should call AtomicReader#checkIntegrity() on existing segments before merging them into a new one.

Use true to enable this safety check, which can help reduce the risk of propagating index corruption from older segments into new ones, at the expense of slower merging.

SetMaxBufferedDeleteTerms ( int maxBufferedDeleteTerms ) : LiveIndexWriterConfig

Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.

Disabled by default (writer flushes by RAM usage).

NOTE: this setting won't trigger a segment flush.

Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.

SetMaxBufferedDocs ( int maxBufferedDocs ) : LiveIndexWriterConfig

Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.

When this is set, the writer will flush every maxBufferedDocs added documents. Pass in IndexWriterConfig#DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

Takes effect immediately, but only the next time a document is added, updated or deleted.

SetMergedSegmentWarmer ( IndexReaderWarmer mergeSegmentWarmer ) : LiveIndexWriterConfig

Set the merged segment warmer. See IndexReaderWarmer.

Takes effect on the next merge.

SetRAMBufferSizeMB ( double ramBufferSizeMB ) : LiveIndexWriterConfig

Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in IndexWriterConfig#DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an IndexWriter session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.

NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using #setMaxBufferedDeleteTerms(int) to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete Query a constant number of bytes is used to estimate RAM usage). Note that enabling #setMaxBufferedDeleteTerms(int) will not trigger any segment flushes.

NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured FlushPolicy only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.

The default value is IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB.

Takes effect immediately, but only the next time a document is added, updated or deleted.

SetReaderTermsIndexDivisor ( int divisor ) : LiveIndexWriterConfig

Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in DirectoryReader#open(IndexWriter, boolean). If you pass -1, the terms index won't be loaded by the readers. this is only useful in advanced situations when you will only .Next() through all terms; attempts to seek will hit an exception.

Takes effect immediately, but only applies to readers opened after this call

NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.

SetTermIndexInterval ( int interval ) : LiveIndexWriterConfig

Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.

this parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.

In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

Takes effect immediately, but only applies to newly flushed/merged segments.

NOTE: this parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example, Lucene41PostingsFormat implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead use Lucene41PostingsFormat#Lucene41PostingsFormat(int, int). which can also be configured on a per-field basis:

 //customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100); iwc.SetCodec(new Lucene45Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (field.equals("fieldWithTonsOfTerms")) return tweakedPostings; else return super.getPostingsFormatForField(field); } }); 
Note that other implementations may have their own parameters, or no parameters at all.

SetUseCompoundFile ( bool useCompoundFile ) : LiveIndexWriterConfig

Sets if the IndexWriter should pack newly written segments in a compound file. Default is true.

Use false for batch indexing with very large ram buffer settings.

Note: To control compound file usage during segment merges see MergePolicy#setNoCFSRatio(double) and MergePolicy#setMaxCFSSegmentSizeMB(double). this setting only applies to newly created segments.

ToString ( ) : string

Приватные методы

Метод Описание
LiveIndexWriterConfig ( Analyzer analyzer, Version matchVersion ) : System.Text
LiveIndexWriterConfig ( IndexWriterConfig config ) : System.Text

Creates a new config that that handles the live IndexWriter settings.

Описание методов

SetCheckIntegrityAtMerge() публичный Метод

Sets if IndexWriter should call AtomicReader#checkIntegrity() on existing segments before merging them into a new one.

Use true to enable this safety check, which can help reduce the risk of propagating index corruption from older segments into new ones, at the expense of slower merging.

public SetCheckIntegrityAtMerge ( bool checkIntegrityAtMerge ) : LiveIndexWriterConfig
checkIntegrityAtMerge bool
Результат LiveIndexWriterConfig

SetMaxBufferedDeleteTerms() публичный Метод

Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.

Disabled by default (writer flushes by RAM usage).

NOTE: this setting won't trigger a segment flush.

Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.

/// if maxBufferedDeleteTerms is enabled but smaller than 1 ///
public SetMaxBufferedDeleteTerms ( int maxBufferedDeleteTerms ) : LiveIndexWriterConfig
maxBufferedDeleteTerms int
Результат LiveIndexWriterConfig

SetMaxBufferedDocs() публичный Метод

Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.

When this is set, the writer will flush every maxBufferedDocs added documents. Pass in IndexWriterConfig#DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

Takes effect immediately, but only the next time a document is added, updated or deleted.

/// if maxBufferedDocs is enabled but smaller than 2, or it disables /// maxBufferedDocs when ramBufferSize is already disabled
public SetMaxBufferedDocs ( int maxBufferedDocs ) : LiveIndexWriterConfig
maxBufferedDocs int
Результат LiveIndexWriterConfig

SetMergedSegmentWarmer() публичный Метод

Set the merged segment warmer. See IndexReaderWarmer.

Takes effect on the next merge.

public SetMergedSegmentWarmer ( IndexReaderWarmer mergeSegmentWarmer ) : LiveIndexWriterConfig
mergeSegmentWarmer IndexReaderWarmer
Результат LiveIndexWriterConfig

SetRAMBufferSizeMB() публичный Метод

Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in IndexWriterConfig#DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an IndexWriter session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.

NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using #setMaxBufferedDeleteTerms(int) to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete Query a constant number of bytes is used to estimate RAM usage). Note that enabling #setMaxBufferedDeleteTerms(int) will not trigger any segment flushes.

NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured FlushPolicy only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.

The default value is IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB.

Takes effect immediately, but only the next time a document is added, updated or deleted.

/// if ramBufferSize is enabled but non-positive, or it disables /// ramBufferSize when maxBufferedDocs is already disabled
public SetRAMBufferSizeMB ( double ramBufferSizeMB ) : LiveIndexWriterConfig
ramBufferSizeMB double
Результат LiveIndexWriterConfig

SetReaderTermsIndexDivisor() публичный Метод

Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in DirectoryReader#open(IndexWriter, boolean). If you pass -1, the terms index won't be loaded by the readers. this is only useful in advanced situations when you will only .Next() through all terms; attempts to seek will hit an exception.

Takes effect immediately, but only applies to readers opened after this call

NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.

public SetReaderTermsIndexDivisor ( int divisor ) : LiveIndexWriterConfig
divisor int
Результат LiveIndexWriterConfig

SetTermIndexInterval() публичный Метод

Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.

this parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.

In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

Takes effect immediately, but only applies to newly flushed/merged segments.

NOTE: this parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example, Lucene41PostingsFormat implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead use Lucene41PostingsFormat#Lucene41PostingsFormat(int, int). which can also be configured on a per-field basis:

 //customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100); iwc.SetCodec(new Lucene45Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (field.equals("fieldWithTonsOfTerms")) return tweakedPostings; else return super.getPostingsFormatForField(field); } }); 
Note that other implementations may have their own parameters, or no parameters at all.
public SetTermIndexInterval ( int interval ) : LiveIndexWriterConfig
interval int
Результат LiveIndexWriterConfig

SetUseCompoundFile() публичный Метод

Sets if the IndexWriter should pack newly written segments in a compound file. Default is true.

Use false for batch indexing with very large ram buffer settings.

Note: To control compound file usage during segment merges see MergePolicy#setNoCFSRatio(double) and MergePolicy#setMaxCFSSegmentSizeMB(double). this setting only applies to newly created segments.

public SetUseCompoundFile ( bool useCompoundFile ) : LiveIndexWriterConfig
useCompoundFile bool
Результат LiveIndexWriterConfig

ToString() публичный Метод

public ToString ( ) : string
Результат string

Описание свойств

Commit защищенное свойство

IndexCommit that IndexWriter is opened on.
protected IndexCommit,Lucene.Net.Index Commit
Результат IndexCommit

MatchVersion защищенное свойство

Version that IndexWriter should emulate.
protected Version MatchVersion
Результат Version

PerThreadHardLimitMB защищенное свойство

Sets the hard upper bound on RAM usage for a single segment, after which the segment is forced to flush.
protected int PerThreadHardLimitMB
Результат int

checkIntegrityAtMerge защищенное свойство

True if merging should check integrity of segments before merge
protected bool checkIntegrityAtMerge
Результат bool

codec защищенное свойство

Codec used to write new segments.
protected Codec,System codec
Результат System.Codec

delPolicy защищенное свойство

DelPolicy controlling when commit points are deleted.
protected IndexDeletionPolicy delPolicy
Результат IndexDeletionPolicy

flushPolicy защищенное свойство

FlushPolicy to control when segments are flushed.
protected FlushPolicy,Lucene.Net.Index flushPolicy
Результат Lucene.Net.Index.FlushPolicy

indexerThreadPool защищенное свойство

{@code DocumentsWriterPerThreadPool} to control how threads are allocated to {@code DocumentsWriterPerThread}.
protected DocumentsWriterPerThreadPool,Lucene.Net.Index indexerThreadPool
Результат Lucene.Net.Index.DocumentsWriterPerThreadPool

indexingChain защищенное свойство

IndexingChain that determines how documents are indexed.
protected IndexingChain indexingChain
Результат IndexingChain

infoStream защищенное свойство

InfoStream for debugging messages.
protected InfoStream,Lucene.Net.Util infoStream
Результат Lucene.Net.Util.InfoStream

mergePolicy защищенное свойство

MergePolicy for selecting merges.
protected MergePolicy,Lucene.Net.Index mergePolicy
Результат MergePolicy

mergeScheduler защищенное свойство

MergeScheduler to use for running merges.
protected MergeScheduler,Lucene.Net.Index mergeScheduler
Результат Lucene.Net.Index.MergeScheduler

openMode защищенное свойство

OpenMode that IndexWriter is opened with.
protected OpenMode_e? openMode
Результат OpenMode_e?

readerPooling защищенное свойство

True if readers should be pooled.
protected bool readerPooling
Результат bool

similarity защищенное свойство

Similarity to use when encoding norms.
protected Similarity similarity
Результат Similarity

useCompoundFile защищенное свойство

True if segment flushes should use compound file format
protected bool useCompoundFile
Результат bool

writeLockTimeout защищенное свойство

Timeout when trying to obtain the write lock on init.
protected long writeLockTimeout
Результат long