Parse XHTML document with undefined entity
See the question and my original answer on StackOverflowEntity resolution is done by the underlying parser which is here a standard XmlReader
(or XmlTextReader
).
Officially, you're supposed to declare entities in DTDs (see Oleg's answer here: Problem with XHTML entities), or load DTDs dynamically into your documents. There are some examples here on SO like this: How do I resolve entities when loading into an XDocument?
What you can also do is create a hacky XmlTextReader
derived class that returns Text
nodes when entities are detected, based on a dictionary, like I demonstrate here in the following sample code:
using (XmlTextReaderWithEntities reader = new XmlTextReaderWithEntities(MyXmlFile))
{
reader.AddEntity("nbsp", "\u00A0");
XDocument xdoc = XDocument.Load(reader);
}
...
public class XmlTextReaderWithEntities : XmlTextReader
{
private string _nextEntity;
private Dictionary<string, string> _entities = new Dictionary<string, string>();
// NOTE: override other constructors for completeness
public XmlTextReaderWithEntities(string path)
: base(path)
{
}
public void AddEntity(string entity, string value)
{
_entities[entity] = value;
}
public override bool Read()
{
if (_nextEntity != null)
return true;
return base.Read();
}
public override XmlNodeType NodeType
{
get
{
if (_nextEntity != null)
return XmlNodeType.Text;
return base.NodeType;
}
}
public override string Value
{
get
{
if (_nextEntity != null)
{
string value = _nextEntity;
_nextEntity = null;
return value;
}
return base.Value;
}
}
public override void ResolveEntity()
{
// if not found, return the string as is
if (!_entities.TryGetValue(LocalName, out _nextEntity))
{
_nextEntity = "&" + LocalName + ";";
}
// NOTE: we don't use base here. Depends on the scenario
}
}
This approach works in simple scenarios, but you may need to override some other stuff for completeness.
PS: sorry it's in C#, you'll have to adapt to VB.NET :)