Using Html Agility Pack to parse nodes in a context sensitive fashion
See the question and my original answer on StackOverflowUsing the Html Agility Pack, you can leverage XPATH power - and forget about that verbose xlinq crap :-). The XPATH position() function is context sensitive. Here is a sample code:
HtmlDocument doc = new HtmlDocument();
doc.Load("your html file");
// select all DIV without a CLASS attribute defined
foreach (HtmlNode div in doc.DocumentNode.SelectNodes("//div[not(@class)]"))
{
Console.WriteLine("div=" + div.InnerText.Trim());
Console.WriteLine(" header=" + div.SelectSingleNode("preceding-sibling::div[position()=1]/b").InnerText);
Console.WriteLine(" date=" + div.SelectSingleNode("preceding-sibling::div[position()=2]/b").InnerText);
}
That will prrint this with your sample:
div=inner hmtl 1
header=Header 1
date=Date 1
div=inner html 2
header=Header 2
date=Date 2