FileExtractor

case class FileExtractor(files: Seq[Path], charset: Charset, format: DataFormat, inference: InferenceEngine, concurrentItems: Int, itemTimeout: Option[FiniteDuration]) extends StreamExtractor[String]

StreamExtractor capable of extracting RDF items from a list of Files, each file is expected to contain a single RDF item

Inspired by https://fs2.io/#/getstarted/example

Value parameters:
charset

Charset to be used to operate the requested files

concurrentItems

Maximum number of items to be extracted and parsed for RDF in parallel (set it to 1 for sequential execution, bear in mind that high values won't necessarily translate into performance improvements unless you know what you are doing)

files

List of files to be processed, represented by their paths

format

Format of the RDF data arriving from the Stream, the Extractor expects all data items to share format

inference

Inference of the RDF data arriving from the Stream, the Extractor expects all data items to share inference

Note:

StreamExtractors type parameter is set to String since data read from files will be interpreted as Strings

Companion:
object
Source:
FileExtractor.scala
trait Serializable
trait Product
trait Equals
class StreamExtractor[String]
class Object
trait Matchable
class Any

Value members

Inherited methods

protected def checkConfiguration(): Unit

Check the user-controlled inputs to this extractor, preventing the creation of it if necessary

Check the user-controlled inputs to this extractor, preventing the creation of it if necessary

Throws:
IllegalArgumentException

On invalid extractor parameters

Inherited from:
StreamExtractor
Source:
StreamExtractor.scala
def productElementNames: Iterator[String]
Inherited from:
Product
def productIterator: Iterator[Any]
Inherited from:
Product

Concrete fields

lazy override private[extractors] val inputStream: Stream[IO, String]

Get the initial input stream by taking the list of files, reading the bytes in each of them, and decoding them according to charset

Get the initial input stream by taking the list of files, reading the bytes in each of them, and decoding them according to charset

Note:

Parallelism in file reading is attempted via prefetch

Source:
FileExtractor.scala

Inherited fields

lazy val dataStream: Stream[IO, RDFValidationItem]

The initial inputStream, transformed through toDataItems to get a stream of RDF Items

The initial inputStream, transformed through toDataItems to get a stream of RDF Items

Inherited from:
StreamExtractor
Source:
StreamExtractor.scala