Please provide any feedback or thoughts on the approach and the posted code below!
My Approach
So my approach was a straight forward approach (no real algorithm) that was implemented basically as a state machine. I new ahead of time that using match statements with rust was likely a good approach so I first proto-typed my approach in python and debugged a bit there before implementing the idea in rust.
My approach was basically to go through both files simultaneously line-by-line , looking for where lines matched or differed. When lines matched/differed in the same way, they were grouped into "chunks", each of a specific kind depending on the nature of the matching/differing. They key piece to sort out was how to detect the various ways in which the two files could differ from each other (eg, addition, deletion, modification).
I settled on this idea which seems to work: A particular kind of changed-chunk is identified by (and ends with) how its final line matches with one of the initial lines of the changed-chunk. The diagram below illustrates.
Beyond this the rest was sorting out the various states the process can enter and how to move between them.
Linked Lines are found to match or be identical
Addition:
A B
- -
|-| <-\ |-| <--- highlighted lines = current chunk of lines identified as changed
|-| \ |-|
- \--> - <--- Current line
Deletion:
- -
|-| /-> |-|
|-| / |-|
- <-/ -
Modification:
- -
|-| |-|
|-| |-|
- <------> -
Rust programming
As for the details of using rust for this, I stumbled on a state machine pattern:
let mut state = State::Start;
loop {
state = match state {
...
State::Finished => break
}
}
... which I found pretty nice.
I also found using an enum for the state variable along with associating data with each state a useful, flexible and elegant way to go. The program basically becomes just a single enum and the looping pattern above, along with a few data structures and some methods. Is this normal/idiomatic for rust??
The only hiccough I ran into, apart from not knowing how to do things like read standard input or a file was thinking I could trivially match on two variables at a time. I wrote about that recently here. Once the compiler got angry at me about that I got worried that I didn't know what I was doing at all, but AFAICT, it was a silly idea in the first place and best avoided for the sort of the thing I was trying to do.
Another minor issue I ran into was that I was tracking line numbers throughout the program, which were used as indices into a vector of strings, and I both wanted to do arithmetic on these numbers but also compare to the lengths of the vectors of lines. I settled on sticking with usize
types for all such numbers, which required some conversions ... but writing this now I think I was just scared and should have just converted and stored the vector lengths to a default i32 from the beginning and made things easier ... oh well.
I'll post the code in a separate comment below to keep things clean.