RPCFN: XML Transformer (#8)

by on April 7, 2010

Ruby Programming Challenge For Newbies

RPCFN: XML Transformer (#8)

By Jamie van Dyke

About Jamie van Dyke

Jamie van DykeJamie van Dyke has been using Ruby and Rails since the beginning of 2005, has contributed significantly to the Rails documentation and code base, as well as running his own Rails business and being responsible for building Engine Yard’s European support team. Jamie is now the CTO over at Boxedup.

Jamie has this to say about the challenge:

This challenge is ideal for both beginner and advanced users. You can solve it in multiple ways and with differing levels of complexity, each level giving more flexibility. I hope you enjoy trying to solve the challenge and I look forward to seeing the results.

Our Awesome Sponsors

This monthly programming challenge is co-sponsored by Eden Development and Backup My App.

A leading software development firm based in Winchester, UK

Eden Development is a leading software development firm based in Winchester, UK, specialising in Ruby applications. They craft dependable, flexible and beautifully made software which meets real business needs. They stand for quality, integrity, real craftsmanship and peace of mind for their customers. They also teach basic Ruby to Agile/XP courses: they love learning themselves and are delighted to support the RubyLearning initiative.

Backup My App

Backup My App is an automatic backup service for Ruby on Rails applications. You simply install the plugin to your Rails application and they handle the rest of the process. They store backup history for several weeks and you can restore any of them automatically. Try out their 1 GB plan for free. Backup My App has co-sponsored this challenge and is proud to make this contribution to the Ruby community.

Prizes

  • The participant with the best Ruby solution (if there is a tie between answers, then the one who posted first will be the winner) will be awarded any one of PeepCode’s Ruby on Rails screencasts and a free 10 GB account for a year from Backup My App.
  • From the remaining working Ruby solutions, three participants would be selected randomly and each one would be awarded any one of Pragmatic’s The Ruby Object Model and Metaprogramming screencasts.
  • All the participants in this challenge (except the participant with the best Ruby solution) will get a free 5 GB account for 6 months from Backup My App.

The four persons who win, can’t win again in the next immediate challenge but can still participate.

The Ruby Challenge

RPCFN

The Challenge

Introduction

I love a good challenge and find it helps you discover different aspects of a language, and also different methods to achieve the same goal. Seeing how others completed a quiz also helps you to expand your knowledge and experience which will give you more insight in the future.

The best tests are always ones based on real world examples. I’ve tried to simplify this one so it’s accessible to all that want to take a stab, while at the same time allowing those that one to be more advanced the opportunity to show off a little. You could complete this with a simple solution, or spend more time and give an elegant, more ruby-ish answer. You choose.

A library I recently wrote had to import data on a regular basis. I needed to normalise the data before I could import it because the data was from different XML feeds, but in unknown formats. So, your challenge is to build an XML transformer that can take any (within reason) XML file and change it into an expected XML format.

Specifics

You can download 3 source XML files as examples. You need to work out how to convert each of these files in to the expected output, bearing in mind that these are merely examples and therefore you should make your transformer as flexible as possible to handle other source inputs.

You will need to employ the use of an XML library, personally I use the nokogiri rubygem, but feel free to choose your own. Once you’ve decided on XML parser it’s up to you how you go about solving this quiz. I’ve designed this quiz to give you the freedom to solve this however you see fit, the only stipulation is that you stick to Ruby!

Additional Information

Qs. Do we have to use an XML parser?

Ans. Well, no, but it will probably be easier if you do.

Qs. Is it okay to assume that the source file has only first names and last names? It has no other data, e.g. ages, sexes, … ?

Ans. In this example there are only 2 fields that you need to worry about. Bonus points go to the results that can handle multiple formats with multiple fields. Of course, any file that has 3 fields in the source, should be outputting 3 fields in the result.

Qs. Is it necessary to indent output data (result xml)?

Ans. The results may ignore whitespace outside of elements but not inside.

How to Enter the Challenge

Read the Challenge Rules. By participating in this challenge, you agree to be bound by these Challenge Rules. It’s free and registration is optional. You can enter the challenge just by posting the following as a comment to this blog post:

  1. Your name:
  2. Country of Residence:
  3. GIST URL of your Solution (i.e. Ruby code) with explanation and / or test cases:
  4. Code works with Ruby 1.8 / 1.9 / Both:
  5. Email address (will not be published):
  6. Brief description of what you do (will not be published):

Note:

  • As soon as we receive your GIST URL, we will fork your submission. This means that your solution is frozen and accepted. Please be sure that is the solution you want, as it is now recorded in time and is the version that will be evaluated.
  • All solutions posted would be hidden to allow participants to come up with their own solutions.
  • You should post your entries before midnight of 26th Apr. 2010 (Indian Standard Time). No new solutions will be accepted from 27th Apr. onwards.
  • On 27th Apr. 2010 all the solutions will be thrown open for everyone to see and comment upon.
  • The winning entries will be announced on this blog before 30th Apr. 2010. The winners will be sent their prizes by email.

More details on the RPCFN?

Please refer to the RPCFN FAQ for answers to the following questions:

Donations

RPCFN is entirely financed by RubyLearning and sometimes sponsors, so if you enjoy solving Ruby problems and would like to give something back by helping with the running costs then any donations are gratefully received.

Click here to lend your support to: Support RubyLearning With Some Love and make a donation at www.pledgie.com !

Acknowledgements

Special thanks to:

  • Jamie van Dyke.
  • Sponsors Eden Development and Backup My App.
  • GitHub, for giving us access to a private repository on GitHub to store all the submitted solutions.
  • The RubyLearning team, namely Satoshi Asakawa (Japan) and Victor Goff III (USA).

Questions?

Contact Satish Talim at satish [dot] talim [at] gmail.com OR if you have any doubts / questions about the challenge (the current problem statement), please post them as comments to this post and the author will reply asap.

The Participants

There are two categories of participants. Some are vying for the prizes and some are participating for the fun of it.

In the competition

  1. Richard Colley, Australia
  2. Paul Barry, USA – declared winner (best solution)
  3. John Prince, USA – declared winner (randomly selected)
  4. Tanzeeb Khalili, Canada – declared winner (randomly selected)
  5. Rémy Coutable, France
  6. Adam Lum, USA – declared winner (randomly selected)
  7. Vijay Thiruvallur, India
  8. Benoit Daloze, Belgium

Just for Fun

The Winners

Winners

Congratulations to the winners of this Ruby Challenge. They are:

Previous Challenge

RPCFN: Broadsides (#7) by James Edward Gray II.

Note: All the previous challenges, sponsors and winners can be seen on the Ruby Programming Challenge for Newbies page.

Update

  • This challenge is now closed.
  • The (#9) challenge by Avdi Grimm, USA is scheduled for 1st May 2010.

Technorati Tags: , , , , ,

Posted by Satish Talim

Follow me on Twitter to communicate and stay connected

{ 35 comments… read them below or add one }

Paul Barry April 7, 2010 at 6:55 pm

Solution (#2):

Paul Barry, Baltimore, MD, USA
Original: https://gist.github.com/9575443f97f8f09bf104
To run it, git clone the repo, make sure you have rubygems and nokogiri installed, then run “ruby transformer_test.rb”

Reply

Satoshi Asakawa April 29, 2010 at 4:55 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.031250 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Congrats!

Reply

Richard Colley April 29, 2010 at 6:24 am

Congratulations Paul! You’re solution looks very elegant and ruby-ish.

I’m curious though, how much code would you need to change to add extra formats or elements into an existing format?

Reply

Benoit Daloze April 7, 2010 at 8:41 pm

Hi,
I just got my hand dirty to solve the set of examples, but I feel I’m far from flexibility and wonder what flexibility can be :)

So, how are we supposed to know that surname is the same as last_name or name/last ? I did a “aliases” Hash actually, but that will just manage this case as long as I don’t have data.

Could you precise a bit what features should be implemented?
I understood those 2:
- tags with ‘_’ must be nested in each other, in reverse order:
=>
- the examples should follow a “name convention”:
[people, person, name, [first, last]]

“[...] and change it into an expected XML format.”
Should we guess the format by analyzing the result?

Thanks for this nice challenge,
B.D.

Reply

Richard Colley April 8, 2010 at 8:49 am

Solution (#1):

Richard Colley, Australia
Original: https://gist.github.com/003a4dec8c2f781647f2 now forked
Works with Ruby 1.8 and 1.9

Reply

Satoshi Asakawa April 29, 2010 at 4:57 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started
FFF
Finished in 0.125000 seconds.

1) Failure:
test_source1(TransformerTest) [test_transformer.rb:15]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePooh
JamietheWeeh”>.

2) Failure:
test_source2(TransformerTest) [test_transformer.rb:15]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePooh
JamietheWeeh”>.

3) Failure:
test_source3(TransformerTest) [test_transformer.rb:15]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePooh
JamietheWeeh”>.

3 tests, 3 assertions, 3 failures, 0 errors, 0 skips

The “ seemed to be superfluous, so I removed them from his output.
Then I could get the following output.

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.046875 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Reply

Richard Colley April 29, 2010 at 6:20 am

Hi Satoshi … can you post these tests somewhere?

What caused the tests to fail? Visually they look the same? Was it just a whitespace issue?

Reply

Jamie van Dyke April 8, 2010 at 12:19 pm

The expected result is given in the bundle, it’s called results.xml.

The flexibility I mentioned is up to you. You could finish the challenge so only the source files can be transformed to the result, or you could implement a solution that can transform new variations that you give it.

You could also go halfway and just assume that the results.xml file will always just contain the elements I’ve supplied, but devise a solution that will take any source input configuration.

I hope that helps.

Reply

John Prince April 14, 2010 at 5:22 am

Solution (#3):

John Prince, USA
Original: http://gist.github.com/365268
Simple command line program to process multiple files at once. Since all three xml source files used a one line per element style, assumed this would always be the case. Also assumed constant order of elements (first, last) as are found in the source files.
Works on Ruby 1.8 and 1.9

Reply

Satoshi Asakawa April 29, 2010 at 4:58 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started
FFF
Finished in 0.015625 seconds.

1) Failure:
test_source1(TransformerTest) [test_transformer.rb:13]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

2) Failure:
test_source2(TransformerTest) [test_transformer.rb:13]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3) Failure:
test_source3(TransformerTest) [test_transformer.rb:13]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3 tests, 3 assertions, 3 failures, 0 errors, 0 skips

The `encoding=”UTF-8″` seemed to be superfluous, so I removed them from his outputs.
Then I got the following ruselt.

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.078125 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Reply

Tanzeeb Khalili April 16, 2010 at 1:22 am

Solution (#4):

Tanzeeb Khalili, Canada
http://gist.github.com/367589 — fairly simple, nokogiri+xpath to match, xmlbuilder to write, invoke with ‘ruby rpcfn8.rb output’
Works in 1.8

Reply

Satoshi Asakawa April 29, 2010 at 4:58 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.015625 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Congrats!

Reply

Richard Colley April 29, 2010 at 7:11 am

Hi Tanzeeb. I actually like your solution best (after my own, of course hehe :). While it doesn’t use the rubyists favourite solution (dsls), the code looks clean, simple, and easy to modify to extend to new formats. Well done!

Reply

Tanzeeb Khalili April 29, 2010 at 11:02 pm

Thanks Richard, seems like we had very similar approaches. :-)

Reply

Tanzeeb Khalili April 29, 2010 at 11:03 pm

Just a correction:

To run my test,

ruby rpcfn8.rb < input > output

Can also just pipe input to ‘ruby rpcfn8.rb’

Reply

rymai April 19, 2010 at 12:29 am

Solution (#5):

Rémy Coutable
Solution: https://gist.github.com/33df1990c11830e94022
Code works with Ruby 1.8 & 1.9

Reply

Satoshi Asakawa April 29, 2010 at 4:59 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started
FFF
Finished in 0.031250 seconds.

1) Failure:
test_source1(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

2) Failure:
test_source2(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3) Failure:
test_source3(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3 tests, 3 assertions, 3 failures, 0 errors, 0 skips

The `encoding=”UTF-8″` seemed to be superfluous, so I removed them from his outputs.
Then I got the following result.

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.031250 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Reply

Adam Lum April 23, 2010 at 11:34 pm

Solution (#6):

Adam Lum, USA
USA
https://gist.github.com/6e9d427b0ff45eb534ec
Source file names are provided via command line argument, tested with the source files provided.
Code works with Ruby 1.9.1

Reply

Satoshi Asakawa April 29, 2010 at 5:00 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started
FFF
Finished in 0.047 seconds.

1) Failure:
test_source1(TransformerTest)
[test_transformer.rb:11:in `stub_method'
test_transformer.rb:15:in `test_source1']:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

2) Failure:
test_source2(TransformerTest)
[test_transformer.rb:11:in `stub_method'
test_transformer.rb:19:in `test_source2']:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3) Failure:
test_source3(TransformerTest)
[test_transformer.rb:11:in `stub_method'
test_transformer.rb:23:in `test_source3']:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3 tests, 3 assertions, 3 failures, 0 errors

The `People` and `Name` were capitalized and the `1.0` was surrounded by single quotes instead of double quotes. So I edited a bit, then got the following.

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.0 seconds.

3 tests, 3 assertions, 0 failures, 0 errors

Reply

Cary Swoveland April 25, 2010 at 5:50 am

Jamie,

I’m not going to be able to finish in time to enter the competition, but I think it was a great challenge. I chose to parse the XML document myself–for the educational value–and learned a lot, especially about the use of classes and ‘self’. It’s my third Ruby program, and also my third object-oriented program. Thanks!

Cary

Reply

Jamie van Dyke April 28, 2010 at 10:10 pm

Cary,

I’m glad you enjoyed it. It was extracted from a real life problem I faced last year that I realised could have varying degrees of flexibility in the solution. In the end I chose a medium strength option that did the job nicely.

Cheers,
Jamie

Reply

Vijay Thiruvallur April 26, 2010 at 7:48 pm

Solution (#7):

Vijay Thiruvallur, India
http://gist.github.com/379418
This code works on Ruby 1.8.7

Reply

Satoshi Asakawa April 29, 2010 at 5:00 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started
FFF
Finished in 0.046875 seconds.

1) Failure:
test_source1(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

2) Failure:
test_source2(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3) Failure:
test_source3(TransformerTest) [test_transformer.rb:14]:
<"WinniethePooh
JamietheWeeh”> expected but was
<"WinniethePoohJamietheWeeh”>.

3 tests, 3 assertions, 3 failures, 0 errors, 0 skips

There were no `name` tag in the result. So I edited a bit, then got the following.

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.031250 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Reply

Benoit Daloze April 26, 2010 at 10:14 pm

Solution (#8):

Benoit Daloze, Belgium
https://gist.github.com/818225a4c22a01f26a4a
Code works with Ruby 1.9.2 >= r25032 (Enumerable#slice_before)
or maybe with http://github.com/marcandre/backports

Reply

Satoshi Asakawa April 29, 2010 at 5:01 am

I wrote a unit test code which compare just three source xml files and ran for this solution.
The result is:

C:\rubyprograms>ruby test_transformer.rb
Loaded suite test_transformer
Started

Finished in 0.296875 seconds.

3 tests, 3 assertions, 0 failures, 0 errors, 0 skips

I got this result with backports on Ruby 1.9.1. Great!

Reply

Benoit Daloze April 29, 2010 at 7:25 pm

Just to mention I apparently got the only flexible solution, which does not need any change to work on others files (have a look at 2_* files in the gist).

The downside is the code is quite long and hard to read. But that’s the price to full flexibility.

Really nice short solutions with DSL or Hash on this Quiz :)

Reply

ashbb April 29, 2010 at 11:39 am

Uploaded the results of a tiny unit test into GitHub repo.

http://github.com/IndianGuru/RPCFN8

Thanks to all of you for your great solutions. :)

Reply

Richard Colley April 29, 2010 at 1:05 pm

Thanks for that ashbb (Satoshi?).

Now I can see why I failed the unit tests … it was because I had left a rule to parse an age attribute when I was testing the extensibility of my YAML based rules.

Good to know there were no actual problems.

Reply

rymai May 4, 2010 at 1:58 am

Hey,

thanks for this repo with all the solutions along with tests, but it seems that there is an error in RPCFN8/05RemyCoutable/test_transformer.rb line 14/15, you should uncomment l14 and comment l15, and the tests pass, why is it currently the opposite ?

Reply

ashbb May 4, 2010 at 7:23 pm

Hi rymai,

You have good eyes!

Once I’d confirmed the output of 05RemyCoutable with l14. But the correct (requested) result is l15. So, currently I leave l15 and comment l14. ;-)

ashbb

Reply

rymaï May 6, 2010 at 3:16 am

Fair enough, but I did that because in my test file I used words with french accents (including my first name, Rémy), so without UTF-8, it’s outputed as Rémy, not good.
BTW, I think it should be better to always set the encoding as UTF-8, don’t you think ?

Rymaï

Reply

ashbb May 6, 2010 at 6:30 am

Hi Rymaï

You are right. I’d totally agree with you. It’s better to add the encoding as UTF-8 in the real project. ;-)

ashbb

Richard Colley April 30, 2010 at 8:13 am

Yeah, nice solution Benoit. I see that you took the approach to parse the “result” file to give you the rules to parse the input files. I considered that approach but decided that I wanted something more expressive than XML to describe my rules. I wanted to be able to indicate that I expected a map, or a list of items etc.

So, I ended up using YAML to describe the desired result. However, this YAML is not needed either … the “rules” are eventually built into a tree of standard ruby containers, in a similar concept to how you build your tree.

To test your assertion that yours was the only one flexible enough to adapt to new formats without code changes, I converted your XML rules defining the output format (2_result.xml) into my representation in YAML. Ran your input file (2_stracture+aliases.xml) and got the correct output.

You can see my YAML rules here: http://gist.github.com/384665

In summary, I think my solution is at least as flexible as yours (maybe more so, because I can say there must be lists of things, singular things, optional parameters etc.) and yet my code is very short. You store the rules in the “result.xml” file, I store my rules currently in YAML (but you could use JSON, or similar).

Of course, none of this really means anything, since Paul won for his beautiful rendition of a ruby dsl, despite how much extra work (comparatively) it would require him to adapt that to different formats.

Reply

Benoit Daloze April 30, 2010 at 2:49 pm

“In summary, I think my solution [...]”
It looks like I didn’t pay attention enough solution. In fact, it’s very flexible too as you say.

The point I wanted to show is you don’t need to modify anything to run the code (except to alias inside of the node with more than one child), it will just adapt on the result you want (so if you different,similar formats, you can just use one as the base).

Another thing, is, will your solution work on nesting.xml ?

I agree you solution is very interesting, and a recursive approach is a good idea (although it might be difficult for nested nodes).

Reply

Richard Colley May 3, 2010 at 5:18 am

I gave it a quick try, and your “nesting.xml” file is read, and parsed, but not quite output correctly. If I had time to play, I think I could be able to make it work. But now I’m doing the clojure course … no extra time to play :)

Reply

Leave a Comment

{ 26 trackbacks }

Previous post:

Next post: