Sunday, April 5, 2015

Export CCP's SDE Tables to PostgreSQL & Rails ActiveRecord Models

When working with Rails and EVE-Online, one often finds themselves in need of access to CCP's SDE.

The data within CCP's dump changes periodically to reflect patches and other changes to the game. Because of this, an adaptable, easy to duplicate process is best when sourcing your data. This post will cover one strategy for making CCP's SDE data accessible within your rails application on a table by table basis.

This guide assumes you have a functioning postgres install and are familiar with how to get a psql prompt.

For the purposes of this demonstration, we'll be trying to provide easy typeName to typeID lookups within the rails application.

Step One: Acquire a Postgres formatted SDE

You can download a copy of Steve Ronuken's Postgres formatted SDE dump here: https://forums.eveonline.com/default.aspx?g=posts&t=414591&find=unread

Next you'll want to load that into your postgres server. To do so you'll need to create a database by the name of 'eve' and a role by the name of 'eve'. This can be done in psql with the following commands:

 CREATE ROLE eve WITH LOGIN CREATEDB;  
 CREATE DATABASE eve OWNER eve;  

A successful role creation will print "CREATE ROLE" to the screen, and a successful database creation will print "CREATE DATABASE" to the screen.

With role and database created and configured, you now need to populate the database. This can be done with pg_restore (thanks Steve!).

pg_restore <postgres_datadump> -d eve

EG:
pg_restore postgres-scylla-1.111482.sql -d eve  

A successful restore is silent. So if you get any errors, they'll be printed to the screen and you'll see a lovely avalanche of broken stuff.


Step Two: Identify the data you're looking for

In this case we're looking for typeID & typeName columns. These can be found in the invTypes table. It is important to note that the SDE does not use snake cased table names as Rails users may be accustomed to. For this reason you'll need to wrap any table names in your select queries in quotes. Thus instead of writing

 SELECT * FROM invTypes;  

You'll need to write

 SELECT * FROM "invTypes";  

note the quotes.


Step Three: Determine the table's schema

Once you've identified the location of the data you're looking for, you'll need to figure out what format that data is being stored in. This can be done using the psql \d+ command.

 \d+ <tablename>;  

EG:
 \d+ "invTypes";  

This command will list all columns, types, modifiers, indexes, etc on the table in question. It's output will look like this:

                  Table "public.invTypes"  
     Column    |    Type    | Modifiers | Storage | Stats target | Description   
 ---------------------+------------------+-----------+----------+--------------+-------------  
  typeID       | bigint      | not null | plain  |       |   
  groupID       | bigint      |      | plain  |       |   
  typeName      | text       |      | extended |       |   
  description     | text       |      | extended |       |   
  mass        | double precision |      | plain  |       |   
  volume       | double precision |      | plain  |       |   
  capacity      | double precision |      | plain  |       |   
  portionSize     | bigint      |      | plain  |       |   
  raceID       | smallint     |      | plain  |       |   
  basePrice      | numeric(19,4)  |      | main   |       |   
  published      | boolean     |      | plain  |       |   
  marketGroupID    | bigint      |      | plain  |       |   
  chanceOfDuplicating | double precision |      | plain  |       |   
 Indexes:  
   "invTypes_pkey" PRIMARY KEY, btree ("typeID")  
   "idx_149637_invTypes_IX_Group" btree ("groupID")  
 eve=#   


Step Four: Define the Rails Model

You can name the model whatever you see fit, but I personally prefer to keep the name as close to CCP's table names as possible.

 rails generate model invType typeID:bigint groupID:bigint typeName:text description:text mass:decimal volume:decimal capacity:decimal portionSize:bigint basePrice:decimal published:boolean marketGroupID:bigint chanceofDuplicating:bigint  

You'll then want to set id: false, change typeID to primary_key, and set the precision of each decimal. This will leave your migration looking something like this:

 class CreateInvTypes < ActiveRecord::Migration  
  def change  
   create_table :invTypes, id: false do |t|  
    t.primary_key :typeID  
    t.bigint :groupID  
    t.text :typeName  
    t.text :description  
    t.decimal :mass, :precision => 64, :scale => 12
    t.decimal :volume, :precision => 64, :scale => 12
    t.decimal :capacity, :precision => 64, :scale => 12  
    t.bigint :portionSize  
    t.decimal :basePrice, :precision => 19, :scale => 4  
    t.boolean :published  
    t.bigint :marketGroupID  
    t.bigint :chanceofDuplicating  
    t.timestamps null: false  
   end  
  end  
 end  

Getting the migration right can be a pain in the butt, so don't be surprised if this stage takes a while.


Step Six: Link your newly generated table & model.

ActiveRecord doesn't play nice with non-snake_cased table names, and the migration above creates a camel cased table name. To fix this issue, you'll need to set a custom table mapping on your model. This is done with the 'table_name' method in rails four and will leave your model looking something like this:

 class InvType < ActiveRecord::Base  
  self.table_name = "invTypes"  
 end  

At this point you should be able to open up your rails console and perform all the traditional CRUD actions on your new table.


Step Five: Copy the Table

With a rails model association setup, it is now merely a matter of populating the table. To do this, we will simply copy the table straight out of the SDE and right into our database. After which the data should be directly accessible within our rails app!

First drop the table from your rails database. This can be done by connecting to your database in psql and using the drop table command.

EG:
 DROP TABLE "invTypes";  

This will remove the table from the database without disturbing the rails application's schema expectations.

Lastly; copy the table from the SDE into your application's database using pg_dump and piping it into psql with the below command:

pg_dump <source_database> -t <sourceTable> | psql <target_database>

EG:
pg_dump eve -t "\"invTypes\"" | psql eveVideoManagementSystem_development  

Note the escaped quotes around the table name. This is important because pg_dump doesn't automagically quote your table names. A successful run will print something like this:

 SET  
 SET  
 SET  
 SET  
 SET  
 SET  
 SET  
 SET  
 SET  
 CREATE TABLE  
 ALTER TABLE  
 COPY 22531  
 ALTER TABLE  
 CREATE INDEX  


Step Six: Change the table's owner

Copying using pg_dump will set the table's owner to 'eve'. Depending on your rails and database configurations, this will likely result in your rails app being unable to access the table. To fix this, change the tables owner, which can be done, once connected to the database, with the below command.

ALTER TABLE <tableName> OWNER TO <new owner's role>
EG:
 ALTER TABLE "invTypes" OWNER TO evevideomanagementsystem  


At this point you should be able to access the table using ActiveRecord in the traditional manner.

Going forward, as new SDEs are released, it should be possible to simply repeat step 5 & 6 to update your application's data to the newest release.


Monday, July 7, 2014

Respecting Owner Limits in Belongs To Associations

So, here's the scenario: I have a rails 4.0.4 application. This application has many shares, each share has many share_users which each have an 'approved' boolean attribute. However each share also has a user_limit. So how does one ensure that the number of a share's approved share_users does not exceed the share's user_limit?

If you just want the solution, skip down to Attempt Three.

Attempt One - Overwrite the share_user's approved=(value) method
New approved=(value) method:

 def approved=(value)  
      share = self.share  
      if value == true  
           if share.share_users.where("approved = true").count < share.user_limit  
                write_attribute(:approved, value)  
           else  
                raise Exceptions::ShareUserLimitReached  
           end  
      end  
      if value == false  
           write_attribute(:approved, value)  
      end  
 end  

Supporting share_user spec:

 describe "approved=(value) > " do  
           it {should respond_to :approved=}  
           let(:approved_share) {FactoryGirl.create(:basic_share, user_limit: 1)}  
           let!(:approved_share_user) {FactoryGirl.create(:share_user, share_id: approved_share.id, approved: true)}  
           let!(:disapproved_share_user) {FactoryGirl.create(:share_user, share_id: approved_share.id, approved: false)}  
           it "should not throw a ShareUserLimitReached error when the share's user_limit is not being exceeded" do  
                expect{share_user.approved=true}.not_to raise_error  
           end  
           it "should throw a ShareUserLimitReached error when the share's user_limit attempts to be exceeded" do  
                approved_share.user_limit = 2  
                approved_share.save  
                expect{disapproved_share_user.approved=true}.to raise_error Exceptions::ShareUserLimitReached  
           end  
           it "should allow share_users to be disapproved without issue" do  
                expect{approved_share_user.approved=false}.not_to raise_error  
           end  
      end  

Now, if you look at the code, you'll see a couple of things.

  • share_user#approved=(value) is not encapsulated. It's reaching through to the share, retrieving attributes from it and then executing upon those attributes.
  • share_user#approved=(value) raises a nasty error that then has to be handled specially everywhere it is likely to be invoked.
  • share_user#approved=(value) heavily complicates creating new share_users as it requires the share_user to already have a share at point of creation. This requires telling the relationship to retrieve it from memory as opposed to the database. More on this can be found here.
All in all, not a great solution, it's messy, doesn't quite do the job and requires modifying a lot of other code.

Attempt Two - validates_associated + validates_with
  1. Create a new validation for the Share called ApprovedShareUserCount and invoke it with validates_with.
  2.  class ApprovedShareUserCountValidator < ActiveModel::Validator  
          def validate(record)  
               #Required to ensure that the presence validation spec for user_limit doesn't trigger this validation to throw a nil error.
               return if record.user_limit.nil?  
    
               count = record.share_users.where("approved = true").count  
               if count > record.user_limit  
                    record.errors.add(:share_users, 'The share has reached its user limit, please contact your CEO or Tech Admin about this issue.')  
               end  
          end  
     end  
    
    Note: I stored my validators in a new folder; app/validators and added a custom eager loading path for it.
     class Share < ActiveRecord::Base       
          has_many :share_users, foreign_key: "share_id", dependent: :destroy  
          validates :user_limit, presence: true  
          validates_with ApprovedShareUserCountValidator  
     end  
    

  3. Invoke this new validation whenever a share_user is saved by adding 'validates_associated :share' to it.
  4.  class ShareUser < ActiveRecord::Base  
          belongs_to :share  
          validates :share_id, presence: true  
          validates_associated :share  
     end  
    
Supporting Share Spec
 describe "should ensure the user_limit is never exceeded" do  
                let!(:user_limit_share) {FactoryGirl.create(:share, user_limit: 1)}  
                let!(:approved_share_user) {FactoryGirl.create(:share_user, approved: true, share_id: user_limit_share.id)}  
                let!(:disapproved_share_user) {FactoryGirl.create(:share_user, approved: false, share_id: user_limit_share.id)}  
                subject{user_limit_share}  
                it "should be valid if the number of approved share_users is <= the share's user_limit" do  
                     should be_valid  
                end  
           end  

Supporting Share_User spec
 describe "validate_association :share" do  
      let!(:user_limit_share) {FactoryGirl.create(:share, user_limit: 1)}  
      let!(:approved_share_user) {FactoryGirl.create(:share_user, approved: true, share_id: user_limit_share.id)}  
      let!(:disapproved_share_user) {FactoryGirl.create(:share_user, approved: false, share_id: user_limit_share.id)}  
      it "should not be valid if approving it would exceed the share's user_limit" do  
           disapproved_share_user.approved = true  
           expect(disapproved_share_user.valid?).to_not be_true  
      end  
 end  

The allure of this is that it ties into the existing validation framework. Unfortunately, there is one critical problem; it doesn't work. The validations all trigger and work perfectly fine! But the share's validation is triggered against the database, and does NOT use the share_user in memory that triggered it.

So to use this solution, the share_user would have to be saved to the database, thus defeating the entire purpose.

Attempt Three - Share#respect_share?(share_user) + ShareUser#validates_with custom validator
Attempt two was definitely far more elegant then attempt one and was certainly moving in the right direction. So how do we build on that with something that respects encapsulation, can be easily expanded on and that uses the existing validation framework?


  1. Create a public method on Share that determines if a given share_user would violate the share's rules.
  2.  def respect_share?(share_user)  
               #Ensure the share's user_limit is respected.  
               if share_user.approved == true  
                    count = self.share_users.where("approved = true").count  
                    #This assumes that in the event this returns true, the share_user will be added to the share.  
                    #Thus it must count up, so that a share with 10 users and a 10 user limit will return false.  
                    if count+1<=self.user_limit  
                         return true  
                    else  
                         return false  
                    end  
               end  
               #This method can be expanded as the share grows.  
          end  
    
    At this point we only care about respecting the user_limit. However as you can see this can be easily expanded by breaking out the if statement (and any additional if statements) into supporting private methods.
  3. Create a custom validator for share_user that invokes the owning share's respect_share? method.
  4.  class RespectShareValidator < ActiveModel::Validator  
          def validate(record)  
               #Required to ensure that the presence validation spec for share_id doesn't trigger this validation to throw a nil error.  
               return if record.share_id.nil?  
               #This simply calls the record's owning share and then passes the record to it.  
               if record.share.respect_share?(record) == false  
                    record.errors.add(:share_users, '<Error Message Here>')  
               end  
          end  
     end  
    

Supporting Share Spec
 describe "Public Methods > " do  
      describe "respect_share?(share_user)" do  
           let!(:user_limit_share) {FactoryGirl.create(:share, user_limit: 1)}  
           let!(:approved_share_user) {FactoryGirl.create(:share_user, approved: true, share_id: user_limit_share.id)}  
           let!(:disapproved_share_user) {FactoryGirl.create(:share_user, approved: false, share_id: user_limit_share.id)}  
           it "should return false if the share's user_limit would be exceeded by saving the passed in share_user" do  
                disapproved_share_user.approved = true  
                expect(user_limit_share.respect_share?(disapproved_share_user)).to be false  
           end  
      end  
 end  

Supporting Share_User Spec
 describe "RespectShareValidator" do  
      let!(:user_limit_share) {FactoryGirl.create(:share, user_limit: 1)}  
      let!(:approved_share_user) {FactoryGirl.create(:share_user, approved: true, share_id: user_limit_share.id)}  
      let!(:disapproved_share_user) {FactoryGirl.create(:share_user, approved: false, share_id: user_limit_share.id)}  
      it "should not be valid if approving it would exceed the share's user_limit" do  
           disapproved_share_user.approved = true  
           expect(disapproved_share_user.valid?).to_not be_true  
      end  
 end  

Unlike Attempt Two this one actually works. This is because the current in-memory representation of the share_user is passed to Share#respect_share? and the method knows that it is dealing with an in memory model instead of a database record. Now while this knowledge isn't ideal, it is much cleaner and functional then either of the other two attempts.

The existing validation mechanisms are used, and there are no custom errors to be caught. This means that the majority of existing code will function without issue. On top of this, because all of the code necessary to determine if a Share would be broken is contained within the Share class itself, this solution doesn't violate encapsulation.


Sunday, April 20, 2014

The Application of Algorithms

Since January I've been taking an Algorithms class from Princeton. Through the marvel that is modern education and the internet, it is possible to learn such a complex subject from such a prestigious university for free. The pusher of this most elucidating drug is Coursera, and the course work is delightfully complicated; especially since it required dusting off my Java foo which had been withering away for the better part of two years now.

Unfortunately, there is one terrible catch;

"I will not make solutions to homework, quizzes or exams available to anyone else. This includes both solutions written by me, as well as any official solutions provided by the course staff. " - Coursera's Honor Code (EULA) [Emphasis Mine]

As you might imagine, this complicates the issue of sharing one's recent work. It is also why this post isn't full of GitHub links to complicated implementations I'm proud of.

Part One of this Algorithms class covers a fairly basic set of algorithms;

  • The foundations of algorithms
    • Speed, memory, order of growth, practical observations and jargon.
  • Union Find
    • A basic connection finding algorithm, not terribly powerful on its own, but a fundamental building block in algorithm theory and a sub routine in many more powerful algorithms.
  • Basic Sort Algorithms
    • A quick summation of the basic sorting algorithms: selection sort, insertion sort and shell sort.
  • Stacks & Queues
    • A more in depth look at the algorithms behind these basic data structures, this section looks at some fundamental OOP concepts with a Java flair.
  • Advanced Sorts
    • Quicksort, Mergesort and their various implementations. This section is really where the class gets good and starkly illustrates the strengths of good algorithms and the trade offs necessary to make them.
  • P-Queues and Symbol Tables
    • The opener to BSTs and Trees as a whole. As with all things in this class, the fundamental data structures beneath a common concept are explained in detail before the main data structure. In a lot of ways it feels like an opening act before the people you paid to see show up.
  • Advanced Trees
    • Balanced search trees and geometric applications of BSTs, specifically 1d,  Kd-trees and line interception, detection, and sort algorithms. This lesson is a real mind bender as it hints at the fundamental basis for machine vision technology.
  • Hash Tables 
    • The last week is spent covering hash tables, since they're really the ultimate form of Symbol Tables. Time is devoted to the core elements of hash table creation, the actual hashing and subsequent sorting and searching. This lesson actually prompted me to go looking through MRI Ruby's implementation of hashes.

While the lectures and assignments have prompted me to think in new ways, this part of the course left me feeling like the knowledge was somewhat difficult to apply. The abstractions discussed above, save maybe the sorting algorithms, are quite far removed from real world problems. The class' assignments were quite good at relating these algorithms back to real world problems, but after turning each one in, I found myself pondering how I could sneak the algorithm into a rails project in some nifty manner.

While Rails is used for many things, the ease with which one can build a simple web application is without a doubt its main strength. Unfortunately, there just aren't many problems in the web-app sphere that lend themselves to these basic algorithms. Sure sorting of data is important, but your database is far better at it then you. There just aren't many nifty ways to use the algorithms discussed above in most Rails apps, the algorithms are too low level. They deal with memory optimization, connectivity and data management, things that are all abstracted away from you by either Ruby, Rails, your database or the inherent nature of the MVC framework. In this situation, the strength of Rails is its greatest weakness.

The first part of the algorithms class ended in the middle of March, but that isn't where this story ends. There are more delightful algorithms to explore, and part two began a week later covering some of the more complicated, and applicable, algorithms used today.


  • Undirected and Directed Graphs
    • Graphs are one of my favorite abstractions. A graph is nothing more then a point of vertices and lines connecting them. The beauty of this structure is that it is flexible enough to represent anything from roads, to social networks, to points of failure in a circuit board. This first lesson focuses on building, searching, sorting and generally using graphs.
  • Minimum Spanning Trees & Shortest Path
    • This lesson guides the student into edge weighted graphs, and the big name algorithms, Kruskal and Prim, that dominate them. Another problem type is introduced, the 'Shortest Path Problem'. The core of problems is that they're a transferable abstraction that can be used to solve numerous different real world problems. The lesson wraps up with distinguishing cyclic vs a-cyclic graphs, Dijkstra's algorithm and the double edged sword of negative weighted edges. All in all this lesson series was my favorite.
  • Max Flow & Radix Sorts
    • Maximum Flow is another problem type, specifically it is the question of how much of something can transit a network from point A to B. As with all problem types, it transfers seamlessly from rail networks to phone lines to the internet itself. The answers to maxflow problems lie in the Ford-Fulkerson algorithm, which as you might've guessed, sits upon a graph. Radix Sorts are a series of different sorts used to manage Strings. Which despite their fundamental nature in all things programming, are quite a complicated construct to work with.
What comes next, I won't know for a little while, but this second part of the class has thus far proven itself to be much more exciting. Unlike the first course, the algorithms discussed above do not deal primarily with memory management or other low level concerns. Instead they leverage a mastery of these low level concepts to solve real world problems. This makes them prime candidates for being integrated into a Rails application. Yet, where does one go to find a problem that needs solving? Perhaps a mock travel agency with route planning using Dijkstra's algorithm? Maybe a game of some kind that uses Ford-Fulkerson to determine the maximum possible score?

Beyond the question of how to display my new found knowledge, there is a subtler subject to discuss; speed and clarity of reasoning. Many times a programmer will find themselves pondering whether they've determined all possible edge cases. Or whether an edge case exists in the first place. This process can consume hours, sometimes days, depending on how it is gone about. An uncaught edge case can lead to a subtle bug that only ever materializes once or twice in a half million queries. Working with these algorithms has improved my ability to see the possibilities of a given situation. That is the greatest boon of these classes. Sure the knowledge is nice, but being able to look at a problem and quickly make heads or tails of it, determine a solution and then work through the finer points of that solution is the real reward.

I highly recommend anyone who hasn't taken such a class to seek one out. The skills you'll develop are well worth the time.



Wednesday, November 20, 2013

Web Scraping in Ruby with Mechanize and Nokogiri

I recently received a code challenge during an interview process. Though I didn't get the job, I realized that though I've managed to build a rails application, I've not really solved a lot of problems with Ruby. This sent me searching for other problems, and while I found some nifty code challenges and general resources, the challenges weren't being maintained and the resources all seemed bland. It was at this point I realized that my friends still back in college are immersed in a world full of quality challenges.

I asked one of my friends, a young man in Louisiana, if he would be willing to share the prompt for his semester final project with me. While the class wasn't a Ruby or Rails class, it was a Database class and it was couched thoroughly in Java, JDBC and MySQL. This made it rather trivial to reimagine the project as a rails application and get to work. The prompt was essentially to build a travel agency flight booking tool. To meet the prompt's requirements on the scale I felt fitting I found myself needing a list of airlines, their destination airports and a list of airports.

Sure this data could be stubbed out, but the whole goal of this adventure was to solve problems in Ruby, so why not learn how to scrape web pages?

The Tools
 - Nokogiri (Mechanize pre-req) http://nokogiri.org/

Desired Data
  - Airport's name, city, country and IATA code.
  - Airline names and destination airport IATA codes.

The Targets
 - Airports
 - Airlines

Wikipedia of course contains a more comprehensive list of airports then flighradar24. However, as we'll discuss in this blog, the highly structured form of flightradar24 makes it MUCH easier to scrape compared to wikipedia.

Part One - Parsing FlightRadar24
This being the first time I've ever scraped a web page I found myself looking for data formats that I could easily understand. A quick look at the source for http://www.flightradar24.com/data/airports/ presents a very well defined unordered list being styled into the pretty web page.
1:            <ul id="countriesList">  
2:  <li>  
3:       <a title="Airports in Albania" href="/data/airports/Albania">  
4:            <div class="left"><img class="lazy" src="http://www.flightradar24.com/data/images/destination/flags-big/albania.gif" alt="Albania flag" /></div>  
5:            <div class="right">Albania airports</div>  
6:       </a>  
7:  </li>  
8:  <li>  
9:       <a title="Airports in Algeria" href="/data/airports/Algeria">  
10:            <div class="left"><img class="lazy" src="http://www.flightradar24.com/data/images/destination/flags-big/algeria.gif" alt="Algeria flag" /></div>  
11:            <div class="right">Algeria airports</div>  
12:       </a>  
13:  </li>  

Within the unordered list are individual list elements containing the link and some image and formatting information. Mechanize is not a full featured HTML parser like Nokogiri, but it does have built in functionality to find links, click links, and fill out forms. These helpers use Nokogiri to locate elements within the dom and interact with them. Unfortunately a lot of the Nokogiri interaction is abstracted away from the user in these methods and there is a loss of functionality as a result, though I am not skilled enough to speak as to how much.

In order to parse this page, I relied almost exclusively on Mechanize's links.find_all method. Unfortunately the source code and documentation that I've been able to find has absolutely no mention of this method.

1:  agent = Mechanize.new  
2:  page = agent.get('http://www.flightradar24.com/data/airports/')  
3:  country_links = page.links.find_all { |l| l.attributes.parent.name == 'li'}  

For each link found on the page, the parent attribute's name is checked against the string. In this case if the parent element is a list element the link will be saved. We know from above that each link we want is wrapped by a list element, but we haven't proven that there are not other links on the page matching this pattern. Instead the assumption is made that this pattern is accurate enough and we proceed to the next step; clicking the links contained within country_links.

1:  country_links.each do |c|  
2:       if country_links.index(c) > 19  
3:            country = c.click  
4:            airport_links = country.links.find_all { |d| d.attributes.parent.name == 'div'}  

In the above snippet at line 2 you can see the result of the "accurate enough" assumption made earlier. I encountered an error where I was not being routed to the country pages as expected. By outputting the link and index of the array I was able to simply add a branch to ignore the first 20 indexes. While adding an if branch is not a terribly inefficient thing, ascertaining what condition to set is not always as easy.

The link is then clicked and the resulting page is passed to the variable country. Notice that Mechanize is smart enough to correctly handle the partial link being passed to it. Country is then parsed for links and the same pattern is executed against it except looking for links with a div element as an immediate parent this time. The resulting links are then added to the airport_links array and we move on to the next step.

1:  airport_links.each do |a|  
2:                 if airport_links.index(a).even?  
3:                      airport = a.click  
4:                      doc = airport.parser  

At line two we once again see the consequences of this method of data acquisition. I repeated the same troubleshooting method and realized that every even index had valid data. As a result we have another almost random if branch. With valid data in hand, the link is clicked, however this time we aren't looking for links; the page is instead passed to the doc variable after being run through Mechanize's .parser method. The doc variable may now be treated as a Nokogiri object, the full spectrum of Nokogiri methods, from .css to xpath may be run against it.

Airport Source
1:  <div class="lightNoise" style="padding: 6px 12px 12px; width: 225px;">  
2:                 <h3 class="noborder">Information</h3>  
3:                 <ul>  
4:                      <li><strong>Airport name: </strong>Albuquerque International Sunport</li>  
5:                      <li><strong>IATA code: </strong>ABQ</li>  
6:                      <li><strong>ICAO code: </strong>KABQ</li>  
7:                      <li><strong>Country: </strong>United States</li>  
8:                      <li><strong>State: </strong>NM</li>                    <li><strong>City: </strong>Albuquerque</li>  
9:                      <li><strong>Time: </strong>17:44 (UTC: 00:44)</li>  
10:                      <li><strong>Latitude: </strong>35.040218</li>  
11:                      <li><strong>Longitude: </strong>-106.609001</li>  
12:                      <li><strong>Altitude: </strong>5355 feet</li>                    <li><strong>Homepage: </strong><a target="_blank" title="Visit homepage" href="">Visit homepage</a></li>  
13:                      <li><strong>Airport info: </strong><a target="_blank" title="Visit Flightstats" href="http://www.flightstats.com/go/Airport/airportDetails.do?airportCode=ABQ">Visit Flightstats</a></li>  
14:                      <li><strong>More info: </strong><a target="_blank" title="Visit Great Circle Mapper" href="http://gc.kls2.com/airport/ABQ">Visit Great Circle Mapper</a></li>  
15:                 </ul>  
16:            </div>  

Parser
1: doc.css('div.lightNoise li').each do |l|  
2:                           if doc.css('div.lightNoise li').index(l) == 0  
3:                                temp_storage[:name] = l.text.slice(14, l.text.length)  
4:                                #puts temp_storage[:name]  
5:                           end  
6:                           if doc.css('div.lightNoise li').index(l) == 1  
7:                                temp_storage[:i_code] = l.text.slice(11, l.text.length)  
8:                                #puts temp_storage[:i_code]  
9:                           end  
10:                           if doc.css('div.lightNoise li').index(l) == 3  
11:                                temp_storage[:country] = l.text.slice(9, l.text.length)  
12:                                #puts temp_storage[:country]  
13:                           end  
14:                           if doc.css('div.lightNoise li').index(l) == 4  
15:                                temp_storage[:city] = l.text.slice(6, l.text.length)  
16:                                #puts temp_storage[:city]  
17:                           end  
18:                      end  
19:                      puts temp_storage[:name]  
20:                      storage.push(temp_storage.clone)  

Using Nokogiri's .css selectors parsing an airport page is relatively trivial. In line one of Airport Source you can see there is a div with the class 'lightnoise'. Contained within this div at lines 4, 5,7 and 8 is Airport name, IATA code, Country and City.

In line one of the parser you can see that the selector follows the convention of delineating between classes and ID's with a period for classes, a hash mark is used for IDs. The parser also accepts child elements delineated by a space. You can see this at work in line one as well where I built an array of all list elements within div.lightnoise.  This array was then iterated through and since the list elements are ordered identically from page to page it was possible to extract data based on index location. The first part of the returned text data is always the same as well, so the excess was removed using ruby's string.slice method. This returned only the dynamic data; in this case the name "Albuquerque International Sunport", IATA code "ABQ", city "Albuquerque", and country "United States".

While the target data was successfully retrieved, the methods used to retrieve target links in this section are undeniably inefficient and error prone. Requiring two different types of branches to ensure the right interactions isn't necessarily terrible, but on a larger project this would have been much more unwieldy.

Source
Scraper: https://github.com/islador/mechanize_scrapers/blob/master/v2.rb
Output: https://github.com/islador/mechanize_scrapers/blob/master/seed/airports.yml


Part Two - Parsing wikipedia

Taking the experience I garnered parsing FlightRadar24 I began work on gathering the Airline data I would need. The best source I found for the data was Wikipedia, specifically http://en.wikipedia.org/wiki/Airline_codes-All . Fortunately the start point for this scrape is a highly regimented table.

As before we start with instantiating a Mechanize instance and visiting a webpage.

1:  agent = Mechanize.new  
2:  begin  
3:       page = agent.get('http://en.wikipedia.org/wiki/Airline_codes-All')  
4:  rescue Mechanize::ResponseCodeError => e  
5:       puts e.to_s  
6:  end  

You'll notice that this time the process is wrapped in some basic error handling. Mechanize has a few error types you can find more detail on by checking the source. This specific rescue block catches 404 and other HTTP response codes.

From there, the page is parsed and sent to a helper function.

1:  airline_code_parser = page.parser  
2:  storage = []  
3:  destination_airports = []  
4:  #find all non-red airline links in the table  
5:  airlines = extract_column_airlines(airline_code_parser, "Airline")  

1:  def extract_column_airlines(page, column_name)  
2:       airport_links = []  
3:       table_width = 0  
4:       airport_index = 0  
5:       #parse the table head and return the index of the "Airport" column as well as the total column count.  
6:       page.css('table.toccolours.sortable th').each do |c|  
7:            if c.text.eql?(column_name)  
8:                 airport_index = page.css('table.toccolours.sortable th').index(c)  
9:                 table_width = page.css('table.toccolours.sortable th').length  
10:                 break  
11:            end  
12:       end  
13:       #parse the table rows  
14:       page.css('table.toccolours.sortable tr').each do |tr|  
15:            #for each each row, parse the columns  
16:            tr.css('td').each do |td|  
17:                 #if the column's index modulous the width is equal it to the desired index  
18:                 if tr.css('td').index(td)%table_width == airport_index  
19:                      #retrieve the link within that column.  
20:                      td.css('a').each do |a|  
21:                           if a['href'].to_s.include?("redlink=1") == false  
22:                                airport_links.push(a['href'].clone)  
23:                           end  
24:                      end  
25:                 end  
26:            end  
27:       end  
28:       return airport_links  
29:  end  

The helper function takes two arguments, a Nokogiri object and a string. The method then locates the table with class="tocolours sortable" and iterates through the table header elements. When it finds a header matching the supplied string it saves the index of that header as well as the length of the header array being iterated through.

The method then iterates through each row in the table extracting the links contained within the column matching the index retrieved earlier. You'll notice that "redlink=1" results in the link being discarded. Links with "redlink=1" in the href are known bad links, so by doing this a large portion of the links in the table are automatically discarded as they're known to lead nowhere. Links that are not discarded are pushed onto the array airport_links and the array is returned at the end of the method.

1:  airlines = extract_column_airlines(airline_code_parser, "Airline")  
2:  #iterate through those links  
3:  airlines.each do |d|  
4:       puts d  
5:       #Visit each airline page  
6:       begin  
7:       airline = agent.get('http://en.wikipedia.org' + d)  
8:       rescue Mechanize::ResponseCodeError, StandardError => e  
9:            puts "Error fetching airline pages: " + e.to_s  
10:       end  
11:       #Build the nokogiri object  
12:       airline_parser_object = airline.parser  
13:       #Extract the airline name from the page title by trimming off the wikipedia suffix  
14:       airline_name = airline_parser_object.title.slice(0, airline_parser_object.title.length-35)
15:       #search for a main destinations article
16:       main_dest_page = find_main_destinations(airline_parser_object)

The returned value is then cast to an array that is then iterated through. You'll notice that in this situation we will never have to visit the source page again as we've already retrieved all useful information from it. Each link is then appended to the domain and visited, in line 8 there is also the addition of StandardError to the rescue block. This is done because Net will throw a getaddrinfo error if d begins with 'http://', so rather then sanitize the data, I elected to catch the error. Then a nokogiri object is made, the Airport's name is extracted from the title and the nokogiri object is passed to the helper method below.

1:  def find_main_destinations(page)  
2:       storage = ""  
3:       page.css('div.rellink a').each do |ad|  
4:            if ad['href'].slice(ad['href'].length-13, ad['href'].length).eql?("_destinations")  
5:                 #returns a string  
6:                 storage = storage + ad['href']  
7:                 break  
8:            end  
9:       end  
10:       return storage  
11:  end  

The find_main_destinations method was built in response to the fact that not all airline pages on wikipedia have a list of destination airports. Some have a separate page linked in the destinations section that links to a list or table of airports. Fortunately all of these relevant links are wrapped in divs with a class of 'rellink' so isolating them is fairly straight forward. I then compared the last thirteen characters in the href and if they were equal to "_destinations" the href is appended to storage. In this manner if a proper rellink is not found an empty string is returned by the method.

1:  if main_dest_page.empty?  
2:            puts "Main dest not found."  
3:            h2_index = 0  
4:            #Search for the destinations section.  
5:            airline_parser_object.css('div#mw-content-text h2').each do |c|  
6:                 #puts "Parsing links"  
7:                 if c.css("span.mw-headline").text.eql?"Destinations"  
8:                      h2_index = airline_parser_object.css('div#mw-content-text h2').index(c)  
9:                 end  
10:            end  

There are two common situations, a different page containing the destinations and the same page containing the destinations. In the event that main_dest_page is empty, meaning there is no separate page, it is necessary to check for a destinations section and if found, to check it for destination airports. Line 5 checks for destination by searching for each h2 within the div mw-content-text, which is the primary content div used by wikipedia. Each found h2 is then further explored by checking the child span mw-headline element's content. If it is equal to "Destinations" then the index is saved as h2_index.

1:  if h2_index != 0  
2:                 destinations = extract_links(["h2"], h2_index, airline_parser_object) 

Assuming h2_index isn't still zero, it, along with an array of delimiter tags and the current nokogiri object are passed to the extract_links method. Be aware that in the unlikely event that an h2_index of 0 is actually the real index of the destination h2, the destination section will not be processed.

1:  def extract_links(delim_tags, index, page)  
2:       extract_ranges = [index...index+1]  
3:       doc = page  
4:       extracted_links = []  
5:       i = 0  
6:       # Change /"html"/"body" to the correct path of the tag which contains this list  
7:       (doc/"html"/"body"/"div").children.each do |el|  
8:        if (delim_tags.include? el.name)  
9:         i += 1  
10:        else  
11:         extract = false  
12:         extract_ranges.each do |cur_range|  
13:          if (cur_range.include? i)  
14:           extract = true  
15:           break  
16:          end  
17:         end  
18:         if extract  
19:              el.children.each do |d|  
20:                   d.css("a").each do |k|  
21:                        #destination = agent.get('en.wikipedia.org' + k['href'])  
22:                        extracted_links.push(k['href'].clone)  
23:                   end  
24:              end  
25:         end  
26:        end  
27:       end  
28:       return extracted_links  
29:  end  

The extract_links method is a modified version of Dan Healy's code at Stack Overflow.  This code pulls each link contained between the passed in index and the next index of the delimiter tag. This is accomplished by first navigating to the location of the h2's, and iterating through the child elements. Each child is then checked if it matches the delimiter, if it does, i is incremented. If it does not, i remains the same and extract_ranges are iterated through. Should the extract_ranges include i, then extract is set to true and the extract_ranges.each loop is broken. If extract is set to true, then for each child,  search for a link and push that link to the extracted_links array. Repeat until i is no longer found within the extract_range. The method then returns the extracted_links array.

This means that all links between two h2's are returned. This methodology is not very precise, but it ensures that all links within the Destinations section of a page will be checked.

1:  destinations.each do |d|  
2:                      begin  
3:                           #puts "Querying page: " + d  
4:                           page = agent.get('http://en.wikipedia.org' + d)  
5:                           nokogiri_page = page.parser  
6:                           #store that IATA code  
7:                           destination_airports.push(extract_iata_code(nokogiri_page))  
8:                      rescue Mechanize::ResponseCodeError, StandardError => e  
9:                           puts "Error fetching airline destination IATA codes: " + e.to_s  
10:                      end  
11:                 end  
12:            end  

The returned links are then iterated through. Each link is visited and if the page doesn't throw an error, the page is parsed into a nokogiri object and passed to the extract_iata_code method.

1:  def extract_iata_code(page)  
2:       page.css('th a').each do |d|  
3:            if d['href'].eql?("/wiki/International_Air_Transport_Association_airport_code")  
4:                 return d.next_element.text  
5:                 break  
6:            end  
7:       end  
8:  end  

The extract_iata_code method searches the page for table headers with links. For each one found, it checks if the href matches the wikipedia page of IATA. If that link is found, then the next element's text is returned. You can see the table being parsed on the right of this page. Each returned value is pushed onto the destination_airports array and this branch ends.

On the next branch, airport pages with main destination pages are handled.

1:  else  
2:            puts "Main dest found."  
3:            begin  
4:                 airline_destination = agent.get('http://en.wikipedia.org' + main_dest_page)  
5:                 airline_destination_parser = airline_destination.parser  
6:                 airport_links = extract_column_airports(airline_destination_parser, "Airport")  

This branch is called when the main_dest_page variable is not empty. The page contained in the variable is visited and a nokogiri object is made. That nokogiri object is then passed to the extract_column_airports method along with the name of the column to be parsed for links, in this case "Airport".

1:  def extract_column_airports(page, column_name)  
2:       airport_links = []  
3:       table_width = 0  
4:       airport_index = 0  
5:       #parse the table head and return the index of the "Airport" column as well as the total column count.  
6:       page.css('table.wikitable.sortable th').each do |c|  
7:            if c.text.eql?(column_name)  
8:                 airport_index = page.css('table.wikitable.sortable th').index(c)  
9:                 table_width = page.css('table.wikitable.sortable th').length  
10:                 break  
11:            end  
12:       end  
13:       #parse the table rows  
14:       page.css('table.wikitable.sortable tr').each do |tr|  
15:            #for each each row, parse the columns  
16:            tr.css('td').each do |td|  
17:                 #if the column's index modulous the width is equal it to the desired index  
18:                 if tr.css('td').index(td)%table_width == airport_index  
19:                      #retrieve the link within that column.  
20:                      td.css('a').each do |a|  
21:                           airport_links.push(a['href'].clone)  
22:                      end  
23:                 end  
24:            end  
25:       end  
26:       return airport_links  
27:  end  

The extract_column_airports method is functionally identical to the extract_column_airlines method detailed above. The one major difference is that it targets a different class of table element, the "wikitable" instead of the "toccolours" table.

1:  airport_links.each do |d|  
2:                      begin  
3:                           page = agent.get('http://en.wikipedia.org' + d)  
4:                           nokogiri_page = page.parser  
5:                           #store that IATA code  
6:                           destination_airports.push(extract_iata_code(nokogiri_page))  
7:                      rescue Mechanize::ResponseCodeError, StandardError => e  
8:                           puts "Error fetching airport IATA codes from main destination article: " + e.to_s  
9:                      end  
10:                 end  
11:            rescue Mechanize::ResponseCodeError, StandardError => e  
12:                 puts "Error fetching main airline destination article: " + e.to_s  
13:            end  
14:       end  

Each returned airport_link is then iterated through and the IATA code is extracted in the same manner as above. IATA codes are then pushed onto the destination_airports array and the destination handling if branch ends.

1:  storage.push({name: airline_name.clone, destinations: destination_airports.clone})  
2:       #then reset the destination airports array to empty.  
3:       destination_airports = []  
4:       if airlines.index(d)%50 == 0  
5:            puts "Writing airlines#{airlines.index(d)}"  
6:            FileUtils.mkdir_p "./seed/"  
7:            File.open("./seed/airlines#{airlines.index(d)}.yml",'w') do |out|  
8:            YAML.dump(storage, out)  
9:            end  
10:       end  
11:  end  
12:  puts "Converting final array to yaml."  
13:  FileUtils.mkdir_p "./seed/"  
14:  File.open("./seed/airlines.yml",'w') do |out|  
15:       YAML.dump(storage, out)  
16:  end  

Within the airlines loop, after the destination_airports array has been built for the current airline, a hash is built consisting of the name of the airline and the destinations the airline flies to. This hash is then pushed onto the storage array, so the storage array consists of hashes made up of a string and an array containing strings. The destination_airports array is then reset to empty. This prevents writing IATA codes from all airlines parsed to each airline's destination.

At every fifty airlines scraped, the storage array is output to a YAML file. This file is dynamically renamed, so there is heavy data duplication. However this allows the process to error out or otherwise fail and for the user to easily restart it from a recent point in the airlines array by adding a simple if block. Considering that this scraper will have to visit in excess of 15,000 web pages, I consider this a worthwhile use.

Lastly, once the scraper finishes and the airlines loop exits, a master airlines.yml is saved to the drive.

Sources
Scraper: https://github.com/islador/mechanize_scrapers/blob/master/airline_scraper_v2.rb
Output: https://github.com/islador/mechanize_scrapers/blob/master/seed/airlines.yml

Monday, November 4, 2013

Asynchronously Rendering Partials with custom jQuery/AJAX and Rails 3.2.14

I recently spent a while learning how to do custom AJAX queries with jQuery and Rails 3.2.14. While I'm well within the realm of a Junior Rails developer, I knew NOTHING about jQuery and only slightly more about AJAX. This post is going to tell a story, but I'm also going to show you some things I learned and haven't seen well documented elsewhere.

While researching and rapid prototyping code, I realized that there are a lot of pitfalls when you're glueing rails and jquery's AJAX functionality together. The biggest pitfall is silent failure in your .js.erb file, but there are plenty of others along the way, which Steve Schwartz over at AlfaJango.com has a great guide on. As I compiled information from the various sources, I realized the documentation for rails AJAX functionality is firmly couched in form_for and "remote: true" in particular. In fact, "remote: true" works for links, buttons, and even form_for's cousin, form_tag. Sadly "remote: true" doesn't work for anything custom, so almost all the documentation out there leaves you wondering. Sure there are some good guides out there like Steve Schwartz's over at AlfaJango.com but even they leave some of the components out.

Custom AJAX methods rely on several components in rails 3.2.14; the parent view, the jQuery & AJAX component itself, the controller method, the supporting .js.erb file and the partial being injected into the page. I won't go so far as to say these are all necessary and this is the only way, but these are the pieces that had to come together for me. These combined to form a nice little loop complete with silent failures and some very loud failures.

Using jQuery, an AJAX call may be made by any function for any reason, this is delightfully powerful, but can also be troublesome. I was trying to fire an AJAX call when the user changed the value in a select box. This required that the parent view containing the select boxes render first. You can see an example select box below taken directly from my app.

1:  <select id="station_select">  
2:    <% station_list = current_user.market_item_summaries.pluck('station_id').uniq %>  
3:    <option value="All">All Stations</option> <!-- The top option is the default option -->  
4:    <% station_list.each do |s|%>  
5:     <option value="<%= s.to_s %>"> <%=Station.where("station_id = ?", s).pluck("name")[0]%> </option>  
6:    <% end %>  
7:   </select>  

So this select box is rendered when the user visits the parent view. You'll see that it has an 'id' value and each option has a dynamic value. The 'id' value is standard HTML and is used by the jQuery to determine what to watch. jQuery can also watch css classes and a few other things, but I prefer ids personally. In order to fire off a jQuery AJAX call I elected to use the 'change' bind, though jQuery has a pile of really nifty events you can use for your functions. So when a user selects an option, the jQuery below is invoked by the browser.

1:  $(document).ready(function(){  
2:      //On change, trigger this function  
3:      $("#station_select, #listing_character, #owner, #type").change(function(){  
4:          //Load all select's values into variables  
5:          var ss = $("#station_select").val();  
6:          var ls = $("#listing_character").val();  
7:          var ow = $("#owner").val();  
8:          var ty = $("#type").val();  
9:          //Fire an AJAX call to marketsummaries/filter  
10:        $.ajax({  
11:            url: "marketsummaries/filter", type: "GET",  
12:            //Pass in each variable as a parameter.  
13:            data: { station_id: ss,   
14:                listing_character_id: ls,   
15:                owner_id: ow,   
16:                type: ty }  
17:        });  
18:      });  
19:  });  

You can see the comments explaining the cycle of events in rough detail, but let's break it down even more.
1:  $(document).ready(function(){  
2:  });  

The above function wraps our entire jQuery function set. This ensures that the document is fully loaded before any functions are called. While this may seem ancillary at first, ensuring the page is fully loaded is an excellent means of heading off bugs at the pass.

1:      $("#station_select, #listing_character, #owner, #type").change(function(){  
2:      });  

After the document is ready, you see the function declaration. jQuery distinguishes itself from JavaScript with a dollar sign before all functions. Knowing this you can see that my jQuery function actually includes several plain JavaScript elements. That aside, within my function declaration you can see that I am calling the selector "#station_select", jQuery uses the number sign to distinguish an id from a class, which would be represented with a dot. ".station_select" for class vs "#station_select" for id. You also see that a comma may be used as a delineator between multiple selectors. Each selector in the above snippet selects an id. The function is then bound to the change event, so when the value of either of the <select>'s changes, this function will fire.

1:          var ss = $("#station_select").val();  
2:          var ls = $("#listing_character").val();  
3:          var ow = $("#owner").val();  
4:          var ty = $("#type").val();  

In the above snippet you can see a combination of regular Javascript and jQuery. "var ss =" is JavaScript, declaring a variable, meanwhile everything right of the equals operator is jQuery. Each variable is assigned the current value of the <select> with the listed id. While this may be rather straight forward, the fact that jQuery and JavaScript can be so easily intermingled may not be as forthcoming as one might expect.

1:        $.ajax({  
2:            url: "marketsummaries/filter", type: "GET",  
3:            //Pass in each variable as a parameter.  
4:            data: { station_id: ss,   
5:                listing_character_id: ls,   
6:                owner_id: ow,   
7:                type: ty }  
8:        });  

Here you can see the actual AJAX call itself. You'll notice that ajax is actually a function within jQuery. The ajax function has a series of arguments that it takes, the primary arguments are; 'url', 'type', 'data' and 'success', the last of which I will explain in greater detail at the end. All arguments are comma delineated. The url arg determines where the AJAX call will be pointed towards, you'll notice this makes it very simple to dovetail it with a controller method. The type arg supports all four REST types and defaults to a GET call, I chose to explicitly define it. The data arg takes a comma delineated hash and will pass the data as query string parameters to the target url. Once again, you'll notice that this dovetails quite well with rails controllers.

1:            data: { station_id: ss,   
2:                listing_character_id: ls,   
3:                owner_id: ow,   
4:                type: ty }  

Each variable is passed as an element of the hash, this means that 'ss' will be accessible in the controller as params[:station_id]. The exact query passed to the server will look like;
1:  marketsummaries/filter?station_id=60000025&listing_character_id=All&owner_id=All&type=All  

Once the AJAX call is made, your controller must be waiting to receive it. This requires that you have a supporting route in your routes.rb file and a properly named method in your controller.
1:  routes.rb  
2:  match "/marketsummaries/filter", to: 'MarketItemSummaries#filter', via: :get  

1:  market_item_summaries_controller.rb  
2:  def filter()  
3:          @input = {"station_id" => nil, "listing_character_id" => nil, "owner_id" => nil, "type" => nil}  
4:          @input["station_id"] = params[:station_id]  
5:          @input["listing_character_id"] = params[:listing_character_id]  
6:          @input["owner_id"] = params[:owner_id]  
7:          @input["type"] = params[:type]  
8:
9:          (Whatever your method needs to do, build a query, what have you.)
10:
11:          respond_to do |format|  
12:              format.js  
13:          end  
14:      end  

Without the routes entry, the rails router will reject the AJAX call with a 403 forbidden leaving your controller's method untouched. The controller method itself must share the name specified in the AJAX call's url. Above you can see me taking each parameter and passing it to an instance hash. From there I can work with the hash more easily then individual parameters.

1:          respond_to do |format|  
2:              format.js  
3:          end  

This last function is extremely important and often missed. "respond_to" allows you to specify what type of response the controller is to send to a given call. You can specify html, xml, pretty much any format. In this case, the controller is going to respond with JavaScript. Because there is no additional block after "format.js", rails defaults to checking for a file "<method_name>.js.erb", in this case "filter.js.erb".

1:  filter.js.erb  
2:  $('#corporationtable').empty().append("<%= escape_javascript(render(:partial => 'corporation_table', locals: {mis: @mis})) %>");  

Like .html.erb files, .js.erb files are run through the embedded ruby pre-processor. This means you can dynamically embed different JavaScript commands into the page before it is actually run through the user's browser. This particular .js.erb file has some important parts though;

1:  $('#corporationtable')  

The above component uses jQuery to target a specific id. While it is possible to target an id on a partial, the jQuery selectors break when dom elements are reloaded through AJAX. For this reason "#corporationtable" is a div on the parent view, not on the partial that is going to be loaded.

1:  .empty().append()  

Normally when you go looking for proper syntax in a jQuery/AJAX .js.erb file, you'll see .html(), not .empty().append(). During my research I stumbled across an excellent, if tangential, blog post by Joe Zim explaining that while .html() does work, the reason it works is not officially supported. This means the jQuery team may randomly patch that functionality out and not tell anyone, so like him, I've elected to use .empty().append() directly rather then through .html(). It is important to note that .empty() destroys all child elements within the specified dom element.

1:  .append("<%= escape_javascript(render(:partial => 'corporation_table', locals: {mis: @mis})) %>");  

The .append() method takes html dom elements as well as html strings as arguments. Combining this with erb to use rails' render to generate a partial and append it to the div specified earlier allows for any element of the page to be updated. You'll notice that the render itself is wrapped in "escape_javascript()". This is done because the majority of partials being rendered will contain html elements that would otherwise escape the double quotes that .append() uses to delineate between start and finish. Kikito over at Stack Overflow does an excellent job explaining this.

1:  locals: {mis: @mis}  

The last crucial element within the .js.erb file is "locals:". The above code takes the variable "@mis" from my controller and makes it available to the partial as "mis". There are two key reasons to use locals, firstly; when rendering a partial in multiple places locals allow you to ensure you're passing it all necessary variables. Secondly; a partial being loaded into an existing view will not have access to the variables that the parent view does.

With filter.js.erb in place, the last necessary element is the partial itself. I want my users to be able to see the partial before selecting options from the <select> boxes, so the partial is actually loaded as an element within the parent view; show.html.erb.

1:  show.html.erb  
2:  <div id="corporationtable">  
3:   <%= render :partial => 'corporation_table', locals: {mis: @mis}%>  
4:  </div>  

As discussed earlier in filter.js.erb, "corporationtable" is the target of the jQuery append() function. This means the new partial is rendered in place of the old one. The partial itself can be seen below.

1:  
2:  <div class="corptable">  
3:   <table class="table table-bordered">  
4:    <thead>  
5:     <tr>  
6:      <h2>Corporation Item Summaries</h2>  
7:     </tr>  
8:    </thead>  
9:    <tr>  
10:     <table class="table table-condensed" style="border-collapse:collapse;"> <!-- Accordioned Table -->  
11:      <tbody id="corp_tbody">  
12:       <% mis.each do |n|%>  
13:       <% if n.entity == 1 %>  
14:         <tr data-toggle="collapse" data-target=".demo<%=mis.index(n)%>" data-target="#data4" class="accordion-toggle"> <!-- First 2 rows are the mis -->  
15:           <td><b> <%= Item.where("type_id = ?", n.type_id).pluck('name')[0] %> </b></td>  
16:           <td> <%= Station.where("station_id = ?", n.station_id).pluck('name')[0] %> </td>  
17:           <td> <%= Character.where("char_id = ?", n.char_id).pluck('name')[0] %> </td>  
18:           <td class="text-success"> <% char = current_user.characters.where("char_id = ?", n.char_id) %>  
19:            <%= Corporation.where("character_id = ?", char).pluck('name')[0] %> </td>  
20:           <td <% if n.bid == false %>   
21:              class="text-success">Sell  
22:             <% else %>  
23:              class="text-error">Buy  
24:             <% end %> </td>  
25:         </tr>  
26:         <tr data-toggle="collapse" data-target=".demo<%=mis.index(n)%>" data-target="#data4" class="accordion-toggle"> <!-- Second line of the mis -->  
27:           <td> APP <%= n.average_purchase_price %> </td>  
28:           <td> ASP <%= n.average_sale_price %> </td>  
29:           <td> APM <%= n.average_percent_markup %> </td>  
30:           <td> TVE <%= n.total_vol_entered %> </td>  
31:           <td> TVR <%= n.total_vol_remaining %> </td>  
32:         </tr>  
33:         <% mo = MarketOrder.where("market_item_summary_id = ?", n.id) %>  
34:         <% mo.each do |m| %>  
35:          <tr> <!-- Each Row after that with TD's classed as hiddenRow are individual orders -->  
36:           <td class="hiddenRow">  
37:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= Item.where("type_id = ?", m.type_id).pluck('name')[0] %></div>  
38:           </td>  
39:           <td class="hiddenRow">  
40:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.price %></div>  
41:           </td>  
42:           <td class="hiddenRow">  
43:             <div class="accordian-body collapse demo<%=mis.index(n)%>">%markup</div>  
44:           </td>  
45:           <td class="hiddenRow">  
46:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.vol_entered %></div>  
47:           </td>  
48:           <td class="hiddenRow">  
49:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.vol_remaining %></div>  
50:           </td>  
51:          </tr>  
52:         <% end %>  
53:        <% end %>  
54:       <% end %>  
55:      </tbody>  
56:     </table>  
57:    </tr>  
58:   </table>  
59:  </div>  

While the partial itself is nothing special, there is one element I would like to draw attention to. Line 11 contains a table body element with the id "corp_tbody". If you look over the code, you'll realize that if there are no valid mis values, corp_tbody will be entirely blank. The architecture of my app does not allow this entire partial to be easily removed. However it is possible to use jQuery and the .ajax() function's success method to hide the entire partial any time corp_tbody is empty.

1:  $(document).ready(function(){  
2:      //On change, trigger this function  
3:      $("#station_select, #listing_character, #owner, #type").change(function(){  
4:          //Load all select's values into variables  
5:          var ss = $("#station_select").val();  
6:          var ls = $("#listing_character").val();  
7:          var ow = $("#owner").val();  
8:          var ty = $("#type").val();  
9:          //Fire an AJAX call to marketsummaries/filter  
10:        $.ajax({  
11:            url: "marketsummaries/filter", type: "GET",  
12:            //Pass in each variable as a parameter.  
13:            data: { station_id: ss,   
14:                listing_character_id: ls,   
15:                owner_id: ow,   
16:                type: ty },  
17:            success: hider  
18:        });  
19:      });  
20:      function hider() {  
21:          //Hide or Show table divs based on whether the table's tbody has content.  
22:          if ($.trim($("#corp_tbody").html())=='') {  
23:              $("#corporationtable").hide()  
24:          }  
25:          if ($.trim($("#corp_tbody").html())!='') {  
26:              $("#corporationtable").show()  
27:          }  
28:          if ($.trim($("#char_tbody").html())=='') {  
29:              $("#charactertable").hide()  
30:          }  
31:          if ($.trim($("#char_tbody").html())!='') {  
32:              $("#charactertable").show()  
33:          }  
34:      }  
35:  });  

At line 20 you can see I've built a JavaScript function and taken advantage of jQuery's easy interoperability with JavaScript. The hider() function calls a series of JavaScript if branches (jQuery actually lacks an if/else mechanism entirely), within the first branch at line 22 I check for corp_tbody being empty. While jQuery does have couple variants on an empty method, according to Serge Shultz over at Stack Overflow, Chrome and Firefox don't play nice with those.

Up at line 17 you can see a very simple addition to the AJAX args list. "success: hider" calls the hider function when the AJAX request completes successfully. That is when the server returns an HTTP 200 to the client. Should the client not receive a 200 code, hider() will never fire. It is important to note that there are no parenthesis in "success: hider", I don't know why that is, but I do know that the function fails to perform reliably with parenthesis.

TL;DR
Asynchronously rendering a partial with jQuery/AJAX/Rails 3.2.14 requires a jQuery AJAX call, supporting routes, a controller with configured method, a properly formatted .js.erb file, a partial, and a target div on the parent view.

The AJAX call must be triggered, the URL it hits must be a valid route, the targeted controller method must be configured to respond with a JavaScript response, this will return a .js.erb file named <method>.js.erb which must contain a target div id and a properly escaped render command. The target div id should be present on the view that triggered the AJAX call, if it is not the partial might not reliably generate.

You can view the full code for my app's jQuery/AJAX/Rails interactions at the following links
routes.rb
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/config/routes.rb
Market Item Summaries Controller
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/controllers/market_item_summaries_controller.rb
Market Item Summaries JavaScript
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/assets/javascripts/market_item_summaries.js
Parent View
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/show_msi.html.erb
Corporation Table Partial
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/_corporation_table.html.erb
Character Table Partial
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/_character_table.html.erb
filter.js.erb
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/filter.js.erb