Losing the 'less than' sign in HtmlAgilityPack loadhtml
See the question and my original answer on StackOverflowThe Html Agility Packs detects this as an error and creates an HtmlParseError instance for it. You can read all errors using the ParseErrors of the HtmlDocument class. So, if you run this code:
string s = "<span style=\"color: #0000FF;\"><</span>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);
doc.Save(Console.Out);
Console.WriteLine();
Console.WriteLine();
foreach (HtmlParseError err in doc.ParseErrors)
{
Console.WriteLine("Error");
Console.WriteLine(" code=" + err.Code);
Console.WriteLine(" reason=" + err.Reason);
Console.WriteLine(" text=" + err.SourceText);
Console.WriteLine(" line=" + err.Line);
Console.WriteLine(" pos=" + err.StreamPosition);
Console.WriteLine(" col=" + err.LinePosition);
}
It will display this (the corrected text first, and details about the error then):
<span style="color: #0000FF;"></span>
Error
code=EndTagNotRequired
reason=End tag </> is not required
text=<
line=1
pos=30
col=31
So you can try to fix this error, as you have all required information (including line, column, and stream position) but the general process of fixing (not detecting) errors in HTML is very complex.