-
Notifications
You must be signed in to change notification settings - Fork 285
Open
Description
I tried to parse a dump of some wikipedia pages with XmlProvider, but no matter what I try, I get a
System.OutOfMemoryException. Is there some guidance/pattern on how to parse large files with type providers?
The file is almost exactly 2 GB large.
my code:
#r "nuget: FSharp.Data"
open FSharp.Data
open System
open System.IO
type Wiki = XmlProvider<"""data/wikidata_sample.xml""">
let xmlFromFile =
task{
let path = "data/wikidata.xml"
let! text = File.ReadAllTextAsync(path)
Wiki.Parse(text).Pages
|> Array.map (fun f -> f.Revision.Text)
|> Array.iter (fun f -> printfn $"{f}")
}
let xmlFromStream =
let options =
new FileStreamOptions(BufferSize=32)
use stream = new FileStream("data/wikidata.xml", options)
stream
|> Wiki.Load
|> fun f -> f.Pages
|> Array.map (fun f -> f.Revision.Text.Value)
|> Array.iter (fun f -> printfn $"{f}")
xmlFromStream
// xmlFromFile
// |> Async.AwaitTask
// |> Async.RunSynchronously
Metadata
Metadata
Assignees
Labels
No labels