Retrieve attributes and span using HTMLAgilityPack library
See the question and my original answer on StackOverflowYou were not that far from the solution. Two important notes:
//
is a recursive call. It can have some heavy performance impact, and also it may select nodes you don't want, so I suggest you only use it when the hierarchy is deep or complex or variable, and you don't want to specify the whole path.- There is a useful helper method on
XmlNode
namedGetAttributeValue
which will you get an attribute even if it does not exist (you need to specify the default value).
Here is a sample that seems to work:
' select the base/parent DIV (here we use a discriminant CLASS attribute)
' all select calls below will use this DIV element as a starting point
Dim node As HtmlNode = htmldoc.DocumentNode.SelectNodes("//div[@class='item']")
' get to the A tag which is a child or grand child (//) of a 'release' DIV
Console.WriteLine(("Title :" & node.SelectSingleNode("div[@class='release']//a").GetAttributeValue("title", CStr(Nothing))))
' get to the IMG tag which is a child or grand child (//) of a 'thumb' DIV
Console.WriteLine(("Cover :" & node.SelectSingleNode("div[@class='thumb']//img").GetAttributeValue("src", CStr(Nothing))))
' get to the SPAN tag which is a child or grand child (//) of a 'release-year' DIV
Console.WriteLine(("Year :" & node.SelectSingleNode("div[@class='release-year']//span").InnerText))
' get all A elements which are child or grand child(//) of a 'genre' DIV
Dim nodes As HtmlNodeCollection = node.SelectNodes("div[@class='genre']//a")
Dim i As Integer
For i = 0 To nodes.Count - 1
Console.WriteLine(String.Concat(New Object() { "Genre", (i + 1), ":", nodes.Item(i).InnerText }))
Next i
' get to the A tag which is a child or grand child (//) of a 'release' DIV
Console.WriteLine(("Url :" & node.SelectSingleNode("div[@class='release']//a").GetAttributeValue("href", CStr(Nothing))))