The value of composable parts
I’ve found myself in some strange parsing tasks lately. This is a new thing for me, so don’t take this post as an example of the best practices for parsing. However, FWIW, all the parsers work.
The Setup
Say we have data that looks something like:
"1%400:3.2 6%some_description|100:1"
First we decide what we’re trying to pull out of this. These values happen to be space separated so we can just use the Prelude’s words
words theString
> ["1%400:3.2", "6%some_description|100:1"]
Each string in this list we’ll call a Feature
so we write a data type for it:
data Feature
= Feature
{ row :: String
, col :: String
, value :: String
, descriptor :: Maybe String
} deriving (Show)
Notice that we’re just reading this in as String data at the moment, but we can easily change that once we get the parsing structure down.
Anyway, almost done with the easy stuff. We need to pull the garbage data out somehow. That’s cool, we’ll just write out our signal matchers.
breakSep = string "%"
kvSep = string ":"
descriptionSep = string "|"
The Actual Parsing
Since we’ll be slurping up data until we hit one of the above defined separators, we’ll make a parser to do just that:
anythingUntil :: Parser String -> Parser String
anythingUntil p = manyTill anyToken (p *> return ())
This function eats up any type of input until it hits one of our separators and returns everything before it.
The way we’ll use this is pretty simple
featureP :: Parser Feature
featureP = do
row <- anythingUntil breakSep
desc <- descriptorP
col <- anythingUntil kvSep
value <- manyTill anyToken eof -- get the remaining
return $ Feature row col value desc
Now we need to fill in the optional descriptor
parser
descriptorP :: Parser (Maybe String)
descriptorP = optionMaybe $ try $ anythingUntil descSep
optionMaybe
allows us to optionally consume some data and return a Maybe value.
Since the anythingUntil parser can fail in this case, we need to use try
to save us from erroring out.
The Benefit over X
Personally, I find this easier to reason about than a regex or generic string functions. The point here is that I can easily expand on this and add new detailed parsers (This will be covered in part 2)
I’ve included a snapshot of the ihaskell session I was working in for full context here