Wednesday, November 20, 2013

Web Scraping in Ruby with Mechanize and Nokogiri

I recently received a code challenge during an interview process. Though I didn't get the job, I realized that though I've managed to build a rails application, I've not really solved a lot of problems with Ruby. This sent me searching for other problems, and while I found some nifty code challenges and general resources, the challenges weren't being maintained and the resources all seemed bland. It was at this point I realized that my friends still back in college are immersed in a world full of quality challenges.

I asked one of my friends, a young man in Louisiana, if he would be willing to share the prompt for his semester final project with me. While the class wasn't a Ruby or Rails class, it was a Database class and it was couched thoroughly in Java, JDBC and MySQL. This made it rather trivial to reimagine the project as a rails application and get to work. The prompt was essentially to build a travel agency flight booking tool. To meet the prompt's requirements on the scale I felt fitting I found myself needing a list of airlines, their destination airports and a list of airports.

Sure this data could be stubbed out, but the whole goal of this adventure was to solve problems in Ruby, so why not learn how to scrape web pages?

The Tools
 - Nokogiri (Mechanize pre-req) http://nokogiri.org/

Desired Data
  - Airport's name, city, country and IATA code.
  - Airline names and destination airport IATA codes.

The Targets
 - Airports
 - Airlines

Wikipedia of course contains a more comprehensive list of airports then flighradar24. However, as we'll discuss in this blog, the highly structured form of flightradar24 makes it MUCH easier to scrape compared to wikipedia.

Part One - Parsing FlightRadar24
This being the first time I've ever scraped a web page I found myself looking for data formats that I could easily understand. A quick look at the source for http://www.flightradar24.com/data/airports/ presents a very well defined unordered list being styled into the pretty web page.
1:            <ul id="countriesList">  
2:  <li>  
3:       <a title="Airports in Albania" href="/data/airports/Albania">  
4:            <div class="left"><img class="lazy" src="http://www.flightradar24.com/data/images/destination/flags-big/albania.gif" alt="Albania flag" /></div>  
5:            <div class="right">Albania airports</div>  
6:       </a>  
7:  </li>  
8:  <li>  
9:       <a title="Airports in Algeria" href="/data/airports/Algeria">  
10:            <div class="left"><img class="lazy" src="http://www.flightradar24.com/data/images/destination/flags-big/algeria.gif" alt="Algeria flag" /></div>  
11:            <div class="right">Algeria airports</div>  
12:       </a>  
13:  </li>  

Within the unordered list are individual list elements containing the link and some image and formatting information. Mechanize is not a full featured HTML parser like Nokogiri, but it does have built in functionality to find links, click links, and fill out forms. These helpers use Nokogiri to locate elements within the dom and interact with them. Unfortunately a lot of the Nokogiri interaction is abstracted away from the user in these methods and there is a loss of functionality as a result, though I am not skilled enough to speak as to how much.

In order to parse this page, I relied almost exclusively on Mechanize's links.find_all method. Unfortunately the source code and documentation that I've been able to find has absolutely no mention of this method.

1:  agent = Mechanize.new  
2:  page = agent.get('http://www.flightradar24.com/data/airports/')  
3:  country_links = page.links.find_all { |l| l.attributes.parent.name == 'li'}  

For each link found on the page, the parent attribute's name is checked against the string. In this case if the parent element is a list element the link will be saved. We know from above that each link we want is wrapped by a list element, but we haven't proven that there are not other links on the page matching this pattern. Instead the assumption is made that this pattern is accurate enough and we proceed to the next step; clicking the links contained within country_links.

1:  country_links.each do |c|  
2:       if country_links.index(c) > 19  
3:            country = c.click  
4:            airport_links = country.links.find_all { |d| d.attributes.parent.name == 'div'}  

In the above snippet at line 2 you can see the result of the "accurate enough" assumption made earlier. I encountered an error where I was not being routed to the country pages as expected. By outputting the link and index of the array I was able to simply add a branch to ignore the first 20 indexes. While adding an if branch is not a terribly inefficient thing, ascertaining what condition to set is not always as easy.

The link is then clicked and the resulting page is passed to the variable country. Notice that Mechanize is smart enough to correctly handle the partial link being passed to it. Country is then parsed for links and the same pattern is executed against it except looking for links with a div element as an immediate parent this time. The resulting links are then added to the airport_links array and we move on to the next step.

1:  airport_links.each do |a|  
2:                 if airport_links.index(a).even?  
3:                      airport = a.click  
4:                      doc = airport.parser  

At line two we once again see the consequences of this method of data acquisition. I repeated the same troubleshooting method and realized that every even index had valid data. As a result we have another almost random if branch. With valid data in hand, the link is clicked, however this time we aren't looking for links; the page is instead passed to the doc variable after being run through Mechanize's .parser method. The doc variable may now be treated as a Nokogiri object, the full spectrum of Nokogiri methods, from .css to xpath may be run against it.

Airport Source
1:  <div class="lightNoise" style="padding: 6px 12px 12px; width: 225px;">  
2:                 <h3 class="noborder">Information</h3>  
3:                 <ul>  
4:                      <li><strong>Airport name: </strong>Albuquerque International Sunport</li>  
5:                      <li><strong>IATA code: </strong>ABQ</li>  
6:                      <li><strong>ICAO code: </strong>KABQ</li>  
7:                      <li><strong>Country: </strong>United States</li>  
8:                      <li><strong>State: </strong>NM</li>                    <li><strong>City: </strong>Albuquerque</li>  
9:                      <li><strong>Time: </strong>17:44 (UTC: 00:44)</li>  
10:                      <li><strong>Latitude: </strong>35.040218</li>  
11:                      <li><strong>Longitude: </strong>-106.609001</li>  
12:                      <li><strong>Altitude: </strong>5355 feet</li>                    <li><strong>Homepage: </strong><a target="_blank" title="Visit homepage" href="">Visit homepage</a></li>  
13:                      <li><strong>Airport info: </strong><a target="_blank" title="Visit Flightstats" href="http://www.flightstats.com/go/Airport/airportDetails.do?airportCode=ABQ">Visit Flightstats</a></li>  
14:                      <li><strong>More info: </strong><a target="_blank" title="Visit Great Circle Mapper" href="http://gc.kls2.com/airport/ABQ">Visit Great Circle Mapper</a></li>  
15:                 </ul>  
16:            </div>  

Parser
1: doc.css('div.lightNoise li').each do |l|  
2:                           if doc.css('div.lightNoise li').index(l) == 0  
3:                                temp_storage[:name] = l.text.slice(14, l.text.length)  
4:                                #puts temp_storage[:name]  
5:                           end  
6:                           if doc.css('div.lightNoise li').index(l) == 1  
7:                                temp_storage[:i_code] = l.text.slice(11, l.text.length)  
8:                                #puts temp_storage[:i_code]  
9:                           end  
10:                           if doc.css('div.lightNoise li').index(l) == 3  
11:                                temp_storage[:country] = l.text.slice(9, l.text.length)  
12:                                #puts temp_storage[:country]  
13:                           end  
14:                           if doc.css('div.lightNoise li').index(l) == 4  
15:                                temp_storage[:city] = l.text.slice(6, l.text.length)  
16:                                #puts temp_storage[:city]  
17:                           end  
18:                      end  
19:                      puts temp_storage[:name]  
20:                      storage.push(temp_storage.clone)  

Using Nokogiri's .css selectors parsing an airport page is relatively trivial. In line one of Airport Source you can see there is a div with the class 'lightnoise'. Contained within this div at lines 4, 5,7 and 8 is Airport name, IATA code, Country and City.

In line one of the parser you can see that the selector follows the convention of delineating between classes and ID's with a period for classes, a hash mark is used for IDs. The parser also accepts child elements delineated by a space. You can see this at work in line one as well where I built an array of all list elements within div.lightnoise.  This array was then iterated through and since the list elements are ordered identically from page to page it was possible to extract data based on index location. The first part of the returned text data is always the same as well, so the excess was removed using ruby's string.slice method. This returned only the dynamic data; in this case the name "Albuquerque International Sunport", IATA code "ABQ", city "Albuquerque", and country "United States".

While the target data was successfully retrieved, the methods used to retrieve target links in this section are undeniably inefficient and error prone. Requiring two different types of branches to ensure the right interactions isn't necessarily terrible, but on a larger project this would have been much more unwieldy.

Source
Scraper: https://github.com/islador/mechanize_scrapers/blob/master/v2.rb
Output: https://github.com/islador/mechanize_scrapers/blob/master/seed/airports.yml


Part Two - Parsing wikipedia

Taking the experience I garnered parsing FlightRadar24 I began work on gathering the Airline data I would need. The best source I found for the data was Wikipedia, specifically http://en.wikipedia.org/wiki/Airline_codes-All . Fortunately the start point for this scrape is a highly regimented table.

As before we start with instantiating a Mechanize instance and visiting a webpage.

1:  agent = Mechanize.new  
2:  begin  
3:       page = agent.get('http://en.wikipedia.org/wiki/Airline_codes-All')  
4:  rescue Mechanize::ResponseCodeError => e  
5:       puts e.to_s  
6:  end  

You'll notice that this time the process is wrapped in some basic error handling. Mechanize has a few error types you can find more detail on by checking the source. This specific rescue block catches 404 and other HTTP response codes.

From there, the page is parsed and sent to a helper function.

1:  airline_code_parser = page.parser  
2:  storage = []  
3:  destination_airports = []  
4:  #find all non-red airline links in the table  
5:  airlines = extract_column_airlines(airline_code_parser, "Airline")  

1:  def extract_column_airlines(page, column_name)  
2:       airport_links = []  
3:       table_width = 0  
4:       airport_index = 0  
5:       #parse the table head and return the index of the "Airport" column as well as the total column count.  
6:       page.css('table.toccolours.sortable th').each do |c|  
7:            if c.text.eql?(column_name)  
8:                 airport_index = page.css('table.toccolours.sortable th').index(c)  
9:                 table_width = page.css('table.toccolours.sortable th').length  
10:                 break  
11:            end  
12:       end  
13:       #parse the table rows  
14:       page.css('table.toccolours.sortable tr').each do |tr|  
15:            #for each each row, parse the columns  
16:            tr.css('td').each do |td|  
17:                 #if the column's index modulous the width is equal it to the desired index  
18:                 if tr.css('td').index(td)%table_width == airport_index  
19:                      #retrieve the link within that column.  
20:                      td.css('a').each do |a|  
21:                           if a['href'].to_s.include?("redlink=1") == false  
22:                                airport_links.push(a['href'].clone)  
23:                           end  
24:                      end  
25:                 end  
26:            end  
27:       end  
28:       return airport_links  
29:  end  

The helper function takes two arguments, a Nokogiri object and a string. The method then locates the table with class="tocolours sortable" and iterates through the table header elements. When it finds a header matching the supplied string it saves the index of that header as well as the length of the header array being iterated through.

The method then iterates through each row in the table extracting the links contained within the column matching the index retrieved earlier. You'll notice that "redlink=1" results in the link being discarded. Links with "redlink=1" in the href are known bad links, so by doing this a large portion of the links in the table are automatically discarded as they're known to lead nowhere. Links that are not discarded are pushed onto the array airport_links and the array is returned at the end of the method.

1:  airlines = extract_column_airlines(airline_code_parser, "Airline")  
2:  #iterate through those links  
3:  airlines.each do |d|  
4:       puts d  
5:       #Visit each airline page  
6:       begin  
7:       airline = agent.get('http://en.wikipedia.org' + d)  
8:       rescue Mechanize::ResponseCodeError, StandardError => e  
9:            puts "Error fetching airline pages: " + e.to_s  
10:       end  
11:       #Build the nokogiri object  
12:       airline_parser_object = airline.parser  
13:       #Extract the airline name from the page title by trimming off the wikipedia suffix  
14:       airline_name = airline_parser_object.title.slice(0, airline_parser_object.title.length-35)
15:       #search for a main destinations article
16:       main_dest_page = find_main_destinations(airline_parser_object)

The returned value is then cast to an array that is then iterated through. You'll notice that in this situation we will never have to visit the source page again as we've already retrieved all useful information from it. Each link is then appended to the domain and visited, in line 8 there is also the addition of StandardError to the rescue block. This is done because Net will throw a getaddrinfo error if d begins with 'http://', so rather then sanitize the data, I elected to catch the error. Then a nokogiri object is made, the Airport's name is extracted from the title and the nokogiri object is passed to the helper method below.

1:  def find_main_destinations(page)  
2:       storage = ""  
3:       page.css('div.rellink a').each do |ad|  
4:            if ad['href'].slice(ad['href'].length-13, ad['href'].length).eql?("_destinations")  
5:                 #returns a string  
6:                 storage = storage + ad['href']  
7:                 break  
8:            end  
9:       end  
10:       return storage  
11:  end  

The find_main_destinations method was built in response to the fact that not all airline pages on wikipedia have a list of destination airports. Some have a separate page linked in the destinations section that links to a list or table of airports. Fortunately all of these relevant links are wrapped in divs with a class of 'rellink' so isolating them is fairly straight forward. I then compared the last thirteen characters in the href and if they were equal to "_destinations" the href is appended to storage. In this manner if a proper rellink is not found an empty string is returned by the method.

1:  if main_dest_page.empty?  
2:            puts "Main dest not found."  
3:            h2_index = 0  
4:            #Search for the destinations section.  
5:            airline_parser_object.css('div#mw-content-text h2').each do |c|  
6:                 #puts "Parsing links"  
7:                 if c.css("span.mw-headline").text.eql?"Destinations"  
8:                      h2_index = airline_parser_object.css('div#mw-content-text h2').index(c)  
9:                 end  
10:            end  

There are two common situations, a different page containing the destinations and the same page containing the destinations. In the event that main_dest_page is empty, meaning there is no separate page, it is necessary to check for a destinations section and if found, to check it for destination airports. Line 5 checks for destination by searching for each h2 within the div mw-content-text, which is the primary content div used by wikipedia. Each found h2 is then further explored by checking the child span mw-headline element's content. If it is equal to "Destinations" then the index is saved as h2_index.

1:  if h2_index != 0  
2:                 destinations = extract_links(["h2"], h2_index, airline_parser_object) 

Assuming h2_index isn't still zero, it, along with an array of delimiter tags and the current nokogiri object are passed to the extract_links method. Be aware that in the unlikely event that an h2_index of 0 is actually the real index of the destination h2, the destination section will not be processed.

1:  def extract_links(delim_tags, index, page)  
2:       extract_ranges = [index...index+1]  
3:       doc = page  
4:       extracted_links = []  
5:       i = 0  
6:       # Change /"html"/"body" to the correct path of the tag which contains this list  
7:       (doc/"html"/"body"/"div").children.each do |el|  
8:        if (delim_tags.include? el.name)  
9:         i += 1  
10:        else  
11:         extract = false  
12:         extract_ranges.each do |cur_range|  
13:          if (cur_range.include? i)  
14:           extract = true  
15:           break  
16:          end  
17:         end  
18:         if extract  
19:              el.children.each do |d|  
20:                   d.css("a").each do |k|  
21:                        #destination = agent.get('en.wikipedia.org' + k['href'])  
22:                        extracted_links.push(k['href'].clone)  
23:                   end  
24:              end  
25:         end  
26:        end  
27:       end  
28:       return extracted_links  
29:  end  

The extract_links method is a modified version of Dan Healy's code at Stack Overflow.  This code pulls each link contained between the passed in index and the next index of the delimiter tag. This is accomplished by first navigating to the location of the h2's, and iterating through the child elements. Each child is then checked if it matches the delimiter, if it does, i is incremented. If it does not, i remains the same and extract_ranges are iterated through. Should the extract_ranges include i, then extract is set to true and the extract_ranges.each loop is broken. If extract is set to true, then for each child,  search for a link and push that link to the extracted_links array. Repeat until i is no longer found within the extract_range. The method then returns the extracted_links array.

This means that all links between two h2's are returned. This methodology is not very precise, but it ensures that all links within the Destinations section of a page will be checked.

1:  destinations.each do |d|  
2:                      begin  
3:                           #puts "Querying page: " + d  
4:                           page = agent.get('http://en.wikipedia.org' + d)  
5:                           nokogiri_page = page.parser  
6:                           #store that IATA code  
7:                           destination_airports.push(extract_iata_code(nokogiri_page))  
8:                      rescue Mechanize::ResponseCodeError, StandardError => e  
9:                           puts "Error fetching airline destination IATA codes: " + e.to_s  
10:                      end  
11:                 end  
12:            end  

The returned links are then iterated through. Each link is visited and if the page doesn't throw an error, the page is parsed into a nokogiri object and passed to the extract_iata_code method.

1:  def extract_iata_code(page)  
2:       page.css('th a').each do |d|  
3:            if d['href'].eql?("/wiki/International_Air_Transport_Association_airport_code")  
4:                 return d.next_element.text  
5:                 break  
6:            end  
7:       end  
8:  end  

The extract_iata_code method searches the page for table headers with links. For each one found, it checks if the href matches the wikipedia page of IATA. If that link is found, then the next element's text is returned. You can see the table being parsed on the right of this page. Each returned value is pushed onto the destination_airports array and this branch ends.

On the next branch, airport pages with main destination pages are handled.

1:  else  
2:            puts "Main dest found."  
3:            begin  
4:                 airline_destination = agent.get('http://en.wikipedia.org' + main_dest_page)  
5:                 airline_destination_parser = airline_destination.parser  
6:                 airport_links = extract_column_airports(airline_destination_parser, "Airport")  

This branch is called when the main_dest_page variable is not empty. The page contained in the variable is visited and a nokogiri object is made. That nokogiri object is then passed to the extract_column_airports method along with the name of the column to be parsed for links, in this case "Airport".

1:  def extract_column_airports(page, column_name)  
2:       airport_links = []  
3:       table_width = 0  
4:       airport_index = 0  
5:       #parse the table head and return the index of the "Airport" column as well as the total column count.  
6:       page.css('table.wikitable.sortable th').each do |c|  
7:            if c.text.eql?(column_name)  
8:                 airport_index = page.css('table.wikitable.sortable th').index(c)  
9:                 table_width = page.css('table.wikitable.sortable th').length  
10:                 break  
11:            end  
12:       end  
13:       #parse the table rows  
14:       page.css('table.wikitable.sortable tr').each do |tr|  
15:            #for each each row, parse the columns  
16:            tr.css('td').each do |td|  
17:                 #if the column's index modulous the width is equal it to the desired index  
18:                 if tr.css('td').index(td)%table_width == airport_index  
19:                      #retrieve the link within that column.  
20:                      td.css('a').each do |a|  
21:                           airport_links.push(a['href'].clone)  
22:                      end  
23:                 end  
24:            end  
25:       end  
26:       return airport_links  
27:  end  

The extract_column_airports method is functionally identical to the extract_column_airlines method detailed above. The one major difference is that it targets a different class of table element, the "wikitable" instead of the "toccolours" table.

1:  airport_links.each do |d|  
2:                      begin  
3:                           page = agent.get('http://en.wikipedia.org' + d)  
4:                           nokogiri_page = page.parser  
5:                           #store that IATA code  
6:                           destination_airports.push(extract_iata_code(nokogiri_page))  
7:                      rescue Mechanize::ResponseCodeError, StandardError => e  
8:                           puts "Error fetching airport IATA codes from main destination article: " + e.to_s  
9:                      end  
10:                 end  
11:            rescue Mechanize::ResponseCodeError, StandardError => e  
12:                 puts "Error fetching main airline destination article: " + e.to_s  
13:            end  
14:       end  

Each returned airport_link is then iterated through and the IATA code is extracted in the same manner as above. IATA codes are then pushed onto the destination_airports array and the destination handling if branch ends.

1:  storage.push({name: airline_name.clone, destinations: destination_airports.clone})  
2:       #then reset the destination airports array to empty.  
3:       destination_airports = []  
4:       if airlines.index(d)%50 == 0  
5:            puts "Writing airlines#{airlines.index(d)}"  
6:            FileUtils.mkdir_p "./seed/"  
7:            File.open("./seed/airlines#{airlines.index(d)}.yml",'w') do |out|  
8:            YAML.dump(storage, out)  
9:            end  
10:       end  
11:  end  
12:  puts "Converting final array to yaml."  
13:  FileUtils.mkdir_p "./seed/"  
14:  File.open("./seed/airlines.yml",'w') do |out|  
15:       YAML.dump(storage, out)  
16:  end  

Within the airlines loop, after the destination_airports array has been built for the current airline, a hash is built consisting of the name of the airline and the destinations the airline flies to. This hash is then pushed onto the storage array, so the storage array consists of hashes made up of a string and an array containing strings. The destination_airports array is then reset to empty. This prevents writing IATA codes from all airlines parsed to each airline's destination.

At every fifty airlines scraped, the storage array is output to a YAML file. This file is dynamically renamed, so there is heavy data duplication. However this allows the process to error out or otherwise fail and for the user to easily restart it from a recent point in the airlines array by adding a simple if block. Considering that this scraper will have to visit in excess of 15,000 web pages, I consider this a worthwhile use.

Lastly, once the scraper finishes and the airlines loop exits, a master airlines.yml is saved to the drive.

Sources
Scraper: https://github.com/islador/mechanize_scrapers/blob/master/airline_scraper_v2.rb
Output: https://github.com/islador/mechanize_scrapers/blob/master/seed/airlines.yml

Monday, November 4, 2013

Asynchronously Rendering Partials with custom jQuery/AJAX and Rails 3.2.14

I recently spent a while learning how to do custom AJAX queries with jQuery and Rails 3.2.14. While I'm well within the realm of a Junior Rails developer, I knew NOTHING about jQuery and only slightly more about AJAX. This post is going to tell a story, but I'm also going to show you some things I learned and haven't seen well documented elsewhere.

While researching and rapid prototyping code, I realized that there are a lot of pitfalls when you're glueing rails and jquery's AJAX functionality together. The biggest pitfall is silent failure in your .js.erb file, but there are plenty of others along the way, which Steve Schwartz over at AlfaJango.com has a great guide on. As I compiled information from the various sources, I realized the documentation for rails AJAX functionality is firmly couched in form_for and "remote: true" in particular. In fact, "remote: true" works for links, buttons, and even form_for's cousin, form_tag. Sadly "remote: true" doesn't work for anything custom, so almost all the documentation out there leaves you wondering. Sure there are some good guides out there like Steve Schwartz's over at AlfaJango.com but even they leave some of the components out.

Custom AJAX methods rely on several components in rails 3.2.14; the parent view, the jQuery & AJAX component itself, the controller method, the supporting .js.erb file and the partial being injected into the page. I won't go so far as to say these are all necessary and this is the only way, but these are the pieces that had to come together for me. These combined to form a nice little loop complete with silent failures and some very loud failures.

Using jQuery, an AJAX call may be made by any function for any reason, this is delightfully powerful, but can also be troublesome. I was trying to fire an AJAX call when the user changed the value in a select box. This required that the parent view containing the select boxes render first. You can see an example select box below taken directly from my app.

1:  <select id="station_select">  
2:    <% station_list = current_user.market_item_summaries.pluck('station_id').uniq %>  
3:    <option value="All">All Stations</option> <!-- The top option is the default option -->  
4:    <% station_list.each do |s|%>  
5:     <option value="<%= s.to_s %>"> <%=Station.where("station_id = ?", s).pluck("name")[0]%> </option>  
6:    <% end %>  
7:   </select>  

So this select box is rendered when the user visits the parent view. You'll see that it has an 'id' value and each option has a dynamic value. The 'id' value is standard HTML and is used by the jQuery to determine what to watch. jQuery can also watch css classes and a few other things, but I prefer ids personally. In order to fire off a jQuery AJAX call I elected to use the 'change' bind, though jQuery has a pile of really nifty events you can use for your functions. So when a user selects an option, the jQuery below is invoked by the browser.

1:  $(document).ready(function(){  
2:      //On change, trigger this function  
3:      $("#station_select, #listing_character, #owner, #type").change(function(){  
4:          //Load all select's values into variables  
5:          var ss = $("#station_select").val();  
6:          var ls = $("#listing_character").val();  
7:          var ow = $("#owner").val();  
8:          var ty = $("#type").val();  
9:          //Fire an AJAX call to marketsummaries/filter  
10:        $.ajax({  
11:            url: "marketsummaries/filter", type: "GET",  
12:            //Pass in each variable as a parameter.  
13:            data: { station_id: ss,   
14:                listing_character_id: ls,   
15:                owner_id: ow,   
16:                type: ty }  
17:        });  
18:      });  
19:  });  

You can see the comments explaining the cycle of events in rough detail, but let's break it down even more.
1:  $(document).ready(function(){  
2:  });  

The above function wraps our entire jQuery function set. This ensures that the document is fully loaded before any functions are called. While this may seem ancillary at first, ensuring the page is fully loaded is an excellent means of heading off bugs at the pass.

1:      $("#station_select, #listing_character, #owner, #type").change(function(){  
2:      });  

After the document is ready, you see the function declaration. jQuery distinguishes itself from JavaScript with a dollar sign before all functions. Knowing this you can see that my jQuery function actually includes several plain JavaScript elements. That aside, within my function declaration you can see that I am calling the selector "#station_select", jQuery uses the number sign to distinguish an id from a class, which would be represented with a dot. ".station_select" for class vs "#station_select" for id. You also see that a comma may be used as a delineator between multiple selectors. Each selector in the above snippet selects an id. The function is then bound to the change event, so when the value of either of the <select>'s changes, this function will fire.

1:          var ss = $("#station_select").val();  
2:          var ls = $("#listing_character").val();  
3:          var ow = $("#owner").val();  
4:          var ty = $("#type").val();  

In the above snippet you can see a combination of regular Javascript and jQuery. "var ss =" is JavaScript, declaring a variable, meanwhile everything right of the equals operator is jQuery. Each variable is assigned the current value of the <select> with the listed id. While this may be rather straight forward, the fact that jQuery and JavaScript can be so easily intermingled may not be as forthcoming as one might expect.

1:        $.ajax({  
2:            url: "marketsummaries/filter", type: "GET",  
3:            //Pass in each variable as a parameter.  
4:            data: { station_id: ss,   
5:                listing_character_id: ls,   
6:                owner_id: ow,   
7:                type: ty }  
8:        });  

Here you can see the actual AJAX call itself. You'll notice that ajax is actually a function within jQuery. The ajax function has a series of arguments that it takes, the primary arguments are; 'url', 'type', 'data' and 'success', the last of which I will explain in greater detail at the end. All arguments are comma delineated. The url arg determines where the AJAX call will be pointed towards, you'll notice this makes it very simple to dovetail it with a controller method. The type arg supports all four REST types and defaults to a GET call, I chose to explicitly define it. The data arg takes a comma delineated hash and will pass the data as query string parameters to the target url. Once again, you'll notice that this dovetails quite well with rails controllers.

1:            data: { station_id: ss,   
2:                listing_character_id: ls,   
3:                owner_id: ow,   
4:                type: ty }  

Each variable is passed as an element of the hash, this means that 'ss' will be accessible in the controller as params[:station_id]. The exact query passed to the server will look like;
1:  marketsummaries/filter?station_id=60000025&listing_character_id=All&owner_id=All&type=All  

Once the AJAX call is made, your controller must be waiting to receive it. This requires that you have a supporting route in your routes.rb file and a properly named method in your controller.
1:  routes.rb  
2:  match "/marketsummaries/filter", to: 'MarketItemSummaries#filter', via: :get  

1:  market_item_summaries_controller.rb  
2:  def filter()  
3:          @input = {"station_id" => nil, "listing_character_id" => nil, "owner_id" => nil, "type" => nil}  
4:          @input["station_id"] = params[:station_id]  
5:          @input["listing_character_id"] = params[:listing_character_id]  
6:          @input["owner_id"] = params[:owner_id]  
7:          @input["type"] = params[:type]  
8:
9:          (Whatever your method needs to do, build a query, what have you.)
10:
11:          respond_to do |format|  
12:              format.js  
13:          end  
14:      end  

Without the routes entry, the rails router will reject the AJAX call with a 403 forbidden leaving your controller's method untouched. The controller method itself must share the name specified in the AJAX call's url. Above you can see me taking each parameter and passing it to an instance hash. From there I can work with the hash more easily then individual parameters.

1:          respond_to do |format|  
2:              format.js  
3:          end  

This last function is extremely important and often missed. "respond_to" allows you to specify what type of response the controller is to send to a given call. You can specify html, xml, pretty much any format. In this case, the controller is going to respond with JavaScript. Because there is no additional block after "format.js", rails defaults to checking for a file "<method_name>.js.erb", in this case "filter.js.erb".

1:  filter.js.erb  
2:  $('#corporationtable').empty().append("<%= escape_javascript(render(:partial => 'corporation_table', locals: {mis: @mis})) %>");  

Like .html.erb files, .js.erb files are run through the embedded ruby pre-processor. This means you can dynamically embed different JavaScript commands into the page before it is actually run through the user's browser. This particular .js.erb file has some important parts though;

1:  $('#corporationtable')  

The above component uses jQuery to target a specific id. While it is possible to target an id on a partial, the jQuery selectors break when dom elements are reloaded through AJAX. For this reason "#corporationtable" is a div on the parent view, not on the partial that is going to be loaded.

1:  .empty().append()  

Normally when you go looking for proper syntax in a jQuery/AJAX .js.erb file, you'll see .html(), not .empty().append(). During my research I stumbled across an excellent, if tangential, blog post by Joe Zim explaining that while .html() does work, the reason it works is not officially supported. This means the jQuery team may randomly patch that functionality out and not tell anyone, so like him, I've elected to use .empty().append() directly rather then through .html(). It is important to note that .empty() destroys all child elements within the specified dom element.

1:  .append("<%= escape_javascript(render(:partial => 'corporation_table', locals: {mis: @mis})) %>");  

The .append() method takes html dom elements as well as html strings as arguments. Combining this with erb to use rails' render to generate a partial and append it to the div specified earlier allows for any element of the page to be updated. You'll notice that the render itself is wrapped in "escape_javascript()". This is done because the majority of partials being rendered will contain html elements that would otherwise escape the double quotes that .append() uses to delineate between start and finish. Kikito over at Stack Overflow does an excellent job explaining this.

1:  locals: {mis: @mis}  

The last crucial element within the .js.erb file is "locals:". The above code takes the variable "@mis" from my controller and makes it available to the partial as "mis". There are two key reasons to use locals, firstly; when rendering a partial in multiple places locals allow you to ensure you're passing it all necessary variables. Secondly; a partial being loaded into an existing view will not have access to the variables that the parent view does.

With filter.js.erb in place, the last necessary element is the partial itself. I want my users to be able to see the partial before selecting options from the <select> boxes, so the partial is actually loaded as an element within the parent view; show.html.erb.

1:  show.html.erb  
2:  <div id="corporationtable">  
3:   <%= render :partial => 'corporation_table', locals: {mis: @mis}%>  
4:  </div>  

As discussed earlier in filter.js.erb, "corporationtable" is the target of the jQuery append() function. This means the new partial is rendered in place of the old one. The partial itself can be seen below.

1:  
2:  <div class="corptable">  
3:   <table class="table table-bordered">  
4:    <thead>  
5:     <tr>  
6:      <h2>Corporation Item Summaries</h2>  
7:     </tr>  
8:    </thead>  
9:    <tr>  
10:     <table class="table table-condensed" style="border-collapse:collapse;"> <!-- Accordioned Table -->  
11:      <tbody id="corp_tbody">  
12:       <% mis.each do |n|%>  
13:       <% if n.entity == 1 %>  
14:         <tr data-toggle="collapse" data-target=".demo<%=mis.index(n)%>" data-target="#data4" class="accordion-toggle"> <!-- First 2 rows are the mis -->  
15:           <td><b> <%= Item.where("type_id = ?", n.type_id).pluck('name')[0] %> </b></td>  
16:           <td> <%= Station.where("station_id = ?", n.station_id).pluck('name')[0] %> </td>  
17:           <td> <%= Character.where("char_id = ?", n.char_id).pluck('name')[0] %> </td>  
18:           <td class="text-success"> <% char = current_user.characters.where("char_id = ?", n.char_id) %>  
19:            <%= Corporation.where("character_id = ?", char).pluck('name')[0] %> </td>  
20:           <td <% if n.bid == false %>   
21:              class="text-success">Sell  
22:             <% else %>  
23:              class="text-error">Buy  
24:             <% end %> </td>  
25:         </tr>  
26:         <tr data-toggle="collapse" data-target=".demo<%=mis.index(n)%>" data-target="#data4" class="accordion-toggle"> <!-- Second line of the mis -->  
27:           <td> APP <%= n.average_purchase_price %> </td>  
28:           <td> ASP <%= n.average_sale_price %> </td>  
29:           <td> APM <%= n.average_percent_markup %> </td>  
30:           <td> TVE <%= n.total_vol_entered %> </td>  
31:           <td> TVR <%= n.total_vol_remaining %> </td>  
32:         </tr>  
33:         <% mo = MarketOrder.where("market_item_summary_id = ?", n.id) %>  
34:         <% mo.each do |m| %>  
35:          <tr> <!-- Each Row after that with TD's classed as hiddenRow are individual orders -->  
36:           <td class="hiddenRow">  
37:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= Item.where("type_id = ?", m.type_id).pluck('name')[0] %></div>  
38:           </td>  
39:           <td class="hiddenRow">  
40:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.price %></div>  
41:           </td>  
42:           <td class="hiddenRow">  
43:             <div class="accordian-body collapse demo<%=mis.index(n)%>">%markup</div>  
44:           </td>  
45:           <td class="hiddenRow">  
46:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.vol_entered %></div>  
47:           </td>  
48:           <td class="hiddenRow">  
49:             <div class="accordian-body collapse demo<%=mis.index(n)%>"> <%= m.vol_remaining %></div>  
50:           </td>  
51:          </tr>  
52:         <% end %>  
53:        <% end %>  
54:       <% end %>  
55:      </tbody>  
56:     </table>  
57:    </tr>  
58:   </table>  
59:  </div>  

While the partial itself is nothing special, there is one element I would like to draw attention to. Line 11 contains a table body element with the id "corp_tbody". If you look over the code, you'll realize that if there are no valid mis values, corp_tbody will be entirely blank. The architecture of my app does not allow this entire partial to be easily removed. However it is possible to use jQuery and the .ajax() function's success method to hide the entire partial any time corp_tbody is empty.

1:  $(document).ready(function(){  
2:      //On change, trigger this function  
3:      $("#station_select, #listing_character, #owner, #type").change(function(){  
4:          //Load all select's values into variables  
5:          var ss = $("#station_select").val();  
6:          var ls = $("#listing_character").val();  
7:          var ow = $("#owner").val();  
8:          var ty = $("#type").val();  
9:          //Fire an AJAX call to marketsummaries/filter  
10:        $.ajax({  
11:            url: "marketsummaries/filter", type: "GET",  
12:            //Pass in each variable as a parameter.  
13:            data: { station_id: ss,   
14:                listing_character_id: ls,   
15:                owner_id: ow,   
16:                type: ty },  
17:            success: hider  
18:        });  
19:      });  
20:      function hider() {  
21:          //Hide or Show table divs based on whether the table's tbody has content.  
22:          if ($.trim($("#corp_tbody").html())=='') {  
23:              $("#corporationtable").hide()  
24:          }  
25:          if ($.trim($("#corp_tbody").html())!='') {  
26:              $("#corporationtable").show()  
27:          }  
28:          if ($.trim($("#char_tbody").html())=='') {  
29:              $("#charactertable").hide()  
30:          }  
31:          if ($.trim($("#char_tbody").html())!='') {  
32:              $("#charactertable").show()  
33:          }  
34:      }  
35:  });  

At line 20 you can see I've built a JavaScript function and taken advantage of jQuery's easy interoperability with JavaScript. The hider() function calls a series of JavaScript if branches (jQuery actually lacks an if/else mechanism entirely), within the first branch at line 22 I check for corp_tbody being empty. While jQuery does have couple variants on an empty method, according to Serge Shultz over at Stack Overflow, Chrome and Firefox don't play nice with those.

Up at line 17 you can see a very simple addition to the AJAX args list. "success: hider" calls the hider function when the AJAX request completes successfully. That is when the server returns an HTTP 200 to the client. Should the client not receive a 200 code, hider() will never fire. It is important to note that there are no parenthesis in "success: hider", I don't know why that is, but I do know that the function fails to perform reliably with parenthesis.

TL;DR
Asynchronously rendering a partial with jQuery/AJAX/Rails 3.2.14 requires a jQuery AJAX call, supporting routes, a controller with configured method, a properly formatted .js.erb file, a partial, and a target div on the parent view.

The AJAX call must be triggered, the URL it hits must be a valid route, the targeted controller method must be configured to respond with a JavaScript response, this will return a .js.erb file named <method>.js.erb which must contain a target div id and a properly escaped render command. The target div id should be present on the view that triggered the AJAX call, if it is not the partial might not reliably generate.

You can view the full code for my app's jQuery/AJAX/Rails interactions at the following links
routes.rb
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/config/routes.rb
Market Item Summaries Controller
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/controllers/market_item_summaries_controller.rb
Market Item Summaries JavaScript
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/assets/javascripts/market_item_summaries.js
Parent View
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/show_msi.html.erb
Corporation Table Partial
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/_corporation_table.html.erb
Character Table Partial
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/_character_table.html.erb
filter.js.erb
https://github.com/islador/market_monitor/blob/viewsMarketSummaries/app/views/market_item_summaries/filter.js.erb