Today I was asked to convert an XML file full of e-mail addresses into JSON. I thought this solution might help others learn how to create an XML, parse it, and covert it over to JSON.

This tutorial requires you to create four files:

  1. my_app/Gemfile for managing third-party libraries.
  2. my_app/email_builder.rb will generate XML data using the faker library.
  3. my_app/email_parser.rb will parse the XML data and convert it to JSON.
  4. my_app/lib/file_manager.rb for creating, loading and saving data to files.

If you prefer to download an example application, its availabel on Github.

Step 0 - Getting Started

Create two folders:

  1. Project folder named my_app.
  2. Lib folder name_of_my_ruby_app/lib
cd ~/Desktop
mkdir my_app
mkdir my_app/lib

Step 1 - Create Gemfile

Create my_app/Gemfile and paste this.

# frozen_string_literal: true

source "https://rubygems.org"

git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }

gem 'faker', :git => 'https://github.com/faker-ruby/faker.git', :branch => 'master'
gem "populator", '~> 1.0.0'
gem "excelsior", '~> 0.1.0'
gem "nokogiri", '~> 1.8.0'

After pasting the libraries, install them.

bundle install

Step 2 - Create email_builder.rb

Create my_app/email_builder.rb and paste this.

#!/usr/bin/env ruby

require 'date'
require 'faker'
require 'nokogiri'
require File.join(File.dirname(__FILE__), 'lib', 'file_manager')

class EmailUtil
  include FileManager
  
  NUM_OF_EMAILS = 5000
  FILE_PATH     = "content-#{Date.today.to_s}.xml"

  def initialize
    xml = create_xml
    save(xml, FILE_PATH)
  end
  
  # http://stackoverflow.com/a/27065613
  def create_xml
    builder = Nokogiri::XML::Builder.new(encoding: 'UTF-8') do |xml|
      xml.data {

        NUM_OF_EMAILS.times do
          first_name = Faker::Name.first_name
          last_name  = Faker::Name.last_name
          
          xml.option(
          # This represents a message data tag with an optional full name
          %{#{Faker::Internet.email} (#{['', first_name + " " + last_name].sample})},
          first_name:   ['', first_name].sample,
          last_name:    ['', last_name].sample,
          zip_code:     ['', Faker::Address.zip_code].sample,
          gender:       ["m", "f", "o"].sample,
          dob:          Faker::Date.between(from: Date.parse("1st Jan 1920"), to: Date.parse("1st Jan #{min_age_requirement}")),
          phone_mobile: ['', Faker::PhoneNumber.cell_phone].sample,
          phone_other:  ['', Faker::PhoneNumber.phone_number].sample
          )
        end
        
      }
    end
    builder.to_xml
  end

  private 
      
  def min_age_requirement
    this_year = Time.now.year
    min_age   = 13
    this_year - min_age
  end  
end

EmailUtil.new

Step 3 - Create email_parser.rb

Create my_app/email_parser.rb and paste this.

#!/usr/bin/env ruby
require 'date'
require 'json'
require 'active_support/json'
require 'rubygems'
require "rexml/document"

require File.join(File.dirname(__FILE__), 'lib', 'file_manager')

class List
  include FileManager

  FILE_XML  = "content-#{Date.today.to_s}.xml"
  FILE_JSON = "content-#{Date.today.to_s}.json" 
  
  def initialize(input=FILE_XML)
    if !input.empty?
      puts FILE_XML
      file = load(input)
      json = parse(file)
      save(json, FILE_JSON)
    elsif ARGV.empty?
      puts "Please add an XML filepath"
      puts "For example: ruby init.rb './path/to/file.xml'"
      exit
    else
      ARGV.each_with_index do|a, idx|
        if idx == 0
          load(a)
        end
      end
    end
  end
  
  def parse(file)
    #Create a new Rolodex
    contacts  = Array.new
    #Convert the file to become XML-ready
    doc = REXML::Document.new(file)
    #Iterate through each node
    doc.elements.each_with_index("data/option") { |e, idx| 
      my_text = e.text
      #Capture the email before "("
      before_char = my_text[/[^(]+/]
      #Capture the text after "("
      after_char = my_text[/\(.*/m]

      arr   = my_text.split("(")
      email = arr[0].strip!
      name  = arr[1][/[^)]+/] ? arr[1][/[^)]+/].strip : ""
      
      contacts.push({
        "email": email, 
        "full_name": name,
        "first_name":   e.attributes["first_name"],
        "last_name":    e.attributes["last_name"],
        "zip_code":     e.attributes["zip_code"],
        "gender":       e.attributes["gender"],
        "dob":          e.attributes["dob"],
        "phone_mobile": e.attributes["phone_mobile"],
        "phone_other":  e.attributes["phone_other"]
      })      
    }
    # https://www.rubydoc.info/docs/rails/4.1.7/ActiveSupport/JSON/Encoding#json_encoder-class_method
    json = ActiveSupport::JSON.encode(contacts)
  end  
end

List.new

Step 4 - Create file_manager.rb module

Create my_app/lib/file_manager.rb and paste this.

#!/usr/bin/env ruby

require 'fileutils'

module FileManager
  APP_ROOT    = File.dirname(__FILE__)
  OUTPUT_DIR  = "output"

  def destroy_dir
    puts "destroy_dir"
    FileUtils.rm_rf( OUTPUT_DIR )
  end
  
  def create_dir
    Dir.mkdir( OUTPUT_DIR )
    #Make it platform independent
    $:.unshift( File.join(APP_ROOT, OUTPUT_DIR ) )
  end

  def create_file(file_path)
    File.join(OUTPUT_DIR, file_path)
  end
  
  def load(file)
    File.open(file)
  end
  
  def save(data, path)
    # Create a File
    output = File.new(path, "w")
    # Save data to File

    output.puts data
  end
end

Step 5 - Let's Build!

Create an XML file using the faker library.

ruby email_builder.rb

Parse the XML file and create a JSON file.

ruby email_parser.rb