[Code Snippet] Parse FB2 format book in C#
Here is the actual code snippet to parse FB2 format book using C#. FB2 File format is known as FictionBook format. FB2 is rather simple file format based on XML and basic HTML. The FB2 file contains sections. Sections can contain other sections as well as paragraphs. Paragraphs contain actual text and basic formatting.
The program below loads all sections recursively and print their names to console. Also one can use the fb2verse variable to get the actual text of the paragraph. The approach below uses the XDocument class to get the XElements. The only trick with further parsing is to use the correct XNamespace.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
namespace BookParser
{
public class Program
{
static XNamespace fbSpace = "http://www.gribuser.ru/xml/fictionbook/2.0";
public static void Main(string[] args)
{
XDocument doc = XDocument.Load("https://azbyka.ru/biblia/downloads/bibliya.fb2");
var body = doc.Root.Element( fbSpace + "body");
ParseSection(body, 0, -1);
}
//Recursive function to parse sections
static void ParseSection(XElement body, int level, int bookid)
{
var sections = body.Elements(fbSpace + "section");
if (sections != null)
{
foreach (var section in sections)
{
var title = section.Element(fbSpace + "title");
var padding = new String(' ', level);
Console.WriteLine("{0} title: {1}", padding, title.Value);
var paragraphs = section.Elements(fbSpace + "p");
if (paragraphs != null)
{
foreach (var paragraph in paragraphs)
{
string fb2verse = paragraph.Value;
//use fb2verse to get actual text
//Console.WriteLine(fb2verse);
}
}
ParseSection(section, level+1, bookid);
}
}
}
}
}
You can see the working fiidle here: https://dotnetfiddle.net/e5Z0iX