I recently had to parse some markdown using the marked npm package and convert it into JSON objects for a project I’m working on. When I parsed the markdown I’d get back an array of tokens that would look something like the following:
I started out the “normal” way by doing a for…of loop to iterate through the tokens in the array. This worked, but tracking the start and end of a token meant adding extra variables which ultimately complicated the code. For example, how do you know if you’re in a list? You track it with an inList variable or something similar. That works, but it could definitely be better especially since lists were only one of several types of objects I needed to track.
As the code progressed, I realized that sometimes I needed the index value as I was looping through the tokens. So, I changed the code to loop through the tokens using a standard for loop. While that worked, I still had the problem of tracking where I was in the overall object that was processed (such as a list of items) and it wasn’t as simple as I wanted when I needed to move to the next token manually.
For example, to get all of the items in the list, I had to set an inList type of variable when the list_start token was encountered. Then when the looping continued I had to look for list_item_start, and then the text token. Since I couldn’t access list_start and then easily move down 2 spots to the text I wanted, it was more challenging than it should have been. While I made it work, there were several other scenarios where I ran into this challenge as well.
Although I was able to get my first iteration of the code working fairly quickly, it felt really complex and didn’t sit well with me at all. One of those moments where you realize that while the code works, you’ll never be able to maintain it in the future without remembering all of the little variables that were added and how they were used. If I can’t look at code in the future and get a quick feel for what it’s doing without a lot of analysis, then the code is probably more complex than it needs to be.
I tweeted the following about the current state of the code since it was amusing how it started out so simple and then became so complex:
I started the process of refactoring the code and came up with some good optimizations, but tracking “Where the hell am I…I’m lost!” in the token array was still challenging. After thinking about it more, considering other options such as map/filter, I decided to bite the bullet and refactor the code yet again to use the iterator pattern to make it easy to know where I was in the process.
The iterator pattern is a design pattern in which an iterator is used to traverse a container and access the container’s elements.https://en.wikipedia.org/wiki/Iterator_pattern
I realized early on that using this type of pattern might be easier (I used it a lot in other languages/frameworks), but I was too far down the rabbit hole to go back up. After reaching the bottom of the hole I realized it would be worth the time to convert the code.
I added the following code into the class I was working with to enable doing custom iterations over the tokens:
This enabled me to easily move from to token to token without relying on some type of for loop. It also added the ability to “peek” at the next item without consuming it (more on this in a moment). If you’ve worked with Java, C#, or other languages you’ll recognize this type of pattern since it’s very common in many languages and one of the GOF patterns.
By adding the token iterator I could now do something like the following to iterate through the tokens.
This meant that any time I needed to move to the next item I could simply call this.iterator.next(). That made working with nested child object scenarios MUCH easier overall. For example, working with a list meant iterating over the tokens until I found the list_item_end token. No additional state tracking was needed to know where I was in the tokens.
By using the peek() function I could easily look at the next token without actually moving to it as well:
There are many more things that can be done to the iterator code to enhance it (such as adding support for custom predicates, “iterate until” type logic, etc.), but it’s easy to get started using and works well in the right situation.