org.ragna.comet.stream.extractors.kafka
Type members
Classlikes
StreamExtractor capable of extracting RDF items from an Apache Kafka stream of items
StreamExtractor capable of extracting RDF items from an Apache Kafka stream of items
Kafka streams are operated as FS2 streams thanks to the library FS2-Kafka
- Type parameters:
- K
Type to which kafka-stream keys are deserialized
- V
Type to which kafka-stream values are deserialized
- Value parameters:
- concurrentItems
Maximum number of items to be parsed for RDF in parallel (to parallelize the pulling of items from Kafka, see configuration.concurrentItems)
- configuration
Configuration supplied to this extractor to interact with a Kafka stream
- format
Format of the RDF data arriving from the Stream, the Extractor expects all data items to share format
- inference
Inference of the RDF data arriving from the Stream, the Extractor expects all data items to share inference
- keyDeserializer
Deserializer used to map the incoming stream keys to instances of type K
- valueDeserializer
Deserializer used to map the incoming stream keys to instances of type V
- Throws:
- KafkaException
When the underlying Kafka consumer cannot be built or the connection fails
- See also:
Apache Kafka: https://kafka.apache.org/
FS2-Kafka: https://fd4s.github.io/fs2-kafka/
- Note:
- Companion:
- object
- Source:
- KafkaExtractor.scala
Configuration and parameters related to how the program should interact with a Kafka stream (stream origin location, committing, polling, etc.)
Configuration and parameters related to how the program should interact with a Kafka stream (stream origin location, committing, polling, etc.)
Defaults are provided for most data, yet a target topic is required
- Value parameters:
- autoCommit
Whether to automatically commit offsets periodically (see commitInterval) or not
- commitInterval
Time to wait before performing an offset commit operation
- commitRecovery
Strategy to be followed when a commit operation fails
- commitTimeout
Time to wait for offsets to be committed before raising an error
- concurrentItems
Maximum number of items to be extracted and parsed for RDF in parallel (set it to 1 for sequential execution, bear in mind that high values won't necessarily translate into performance improvements unless you know what you are doing)
- groupId
Id of the group to which the underlying Kafka consumer belongs
- offsetReset
Strategy to be followed when there are no available offsets to consume on the consumer group
- pollInterval
Time interval between poll operations
- pollTimeout
How long the an stream polling operation is allowed to block
- port
Port from which server is broadcasting
- server
Hostname or IP address of the server broadcasting data
- sessionTimeout
Consumer timeout for connecting to Kafka to or processing incoming items, some Kafka servers may limit require certain session timeouts
- topic
Name of the topic from which data will be extracted
- Note:
Commit-behaviour settings are meaningless if autoCommit is set to false
Some restrictions are in place for configuration instances, and invalid values for the configuration data (e.g.: invalid port, instant timeouts/intervals) will not be accepted (see testConfiguration)
- Companion:
- object
- Source:
- KafkaExtractorConfiguration.scala
Companion object with utils used in Kafka extractor configurations
Companion object with utils used in Kafka extractor configurations
- Companion:
- class
- Source:
- KafkaExtractorConfiguration.scala