Note that the links get relative-matched to for now-I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. There's no tricky String Matches or String Converters needed-we are just fetching hrefs. This is a common pattern, and easy to parse: Thankfully, e621 is (and most boorus are) fairly static and simple:Įvery thumb on e621 is a with class="thumb" wrapping an and an. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult-in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making-the browser's developer tools can help you here again). Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now-we just want to parse some stuff. It gets a good name and some example URLs. We've got 75 thumbnails and a bunch of page URLs at the bottom. These guides should roughly follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |