MVC.. 4? Already?! Bundling, Minification, what??

If you’ve been following all the cool new stuff coming in MVC 4, you’ve probably heard of the new Bundling & Minification features. They are seperate, but I like to think of them as parallel concerns. And apparently, Microsoft does too!

<link href="content/reset.css" rel="Stylesheet" />
<link href="content/styleguide.css" rel="Stylesheet" />
<link href="content/style.css" rel="Stylesheet" />
<link href="content/itemlist.css" rel="Stylesheet" />
<link href="content/css.css" rel="Stylesheet" />
<script src="script/widget.js" type="text/javascript"></script>
<script src="script/watermark.js" type="text/javascript"></script>
<script src="script/slideshow.js" type="text/javascript"></script>
<script src="script/dropdown.js" type="text/javascript"></script>

Look familiar? This is what many sites look like, and it actually presents us two opportunities for optimization.

To minify, or not and how to minify

The practice of shrinking or ‘minifying’ script and stylesheet files has become pretty common lately, and there are many ways of doing so. Most methods involve either running a script during deployment that shrinks your existing JS files, or using one of many minification libraries to minify your files on the fly.

Sounds like a pain, but it doesn’t need to be. Not anymore.

Why eggs aren’t sold separately

What if eggs were sold separately; that wouldn’t be much fun. Not that eggs being sold in dozens is much fun either, but it would really be no fun at all to buy them individually. They’d always be rolling around in your cart and if you put anything on them they’d break and make a mess.

Unbundled scripts and stylesheets aren’t quite as messy as a couple dozen eggs rolling around in your cart as you shop, but they do slow down the browsing experience for anyone visiting your website. As soon as their browser hits your site it starts parsing HTML, and figuring out what else it needs to download before it can begin rendering it. Between stylesheets, scripts, and images, there can be a lot of files required. And for each of those files the browser needs to open a connection and request that file.

“Yeah, ok, 30 simultaneous connections.. who cares?” you may be asking yourself. Well most modern browsers are limited to 6 or 8 simultaneous connections per domain, so if it has to get a few big css and js files, the browser will have to wait for those to come down the pipe before requesting your images and whatever else is on your page. And if you’re using Webforms it’s doing all of that while chugging through that big viewstate download too, ugh!!

“But I use subdomain trickery to make your browser think that it’s downloading each script & image from a different server!” Well alright, you win. For the rest of us that want a simple solution though, here it is:

<link href="content/css" rel="Stylesheet" />
<script src="script/js" type="text/javascript"></script>

“Where did my files go?” Well they probably saved to your my doc- er wait, sorry. In all seriousness though, your files are still there. ASP.NET just took all of your CSS & script files and created one CSS and one JS file, all bundled up and minified like scrambled eggs and sent it to your browser. What you are actually doing is telling the framework that you want to take all “content/*.css” files and combine them into one (minified) file, and then the same for “script/*.js”.

There are also some built-in helpers in the namespace that allow you to do the same thing with a bonus feature of helping you with caching:

<link href="@System.Web.Optimization.BundleTable.Bundles.ResolveBundleUrl("~/content/css")" rel="Stylesheet" />
<script src="@System.Web.Optimization.BundleTable.Bundles.ResolveBundleUrl("~/script/js")" type="text/javascript"></script>

What this does is actually make a hash of the file to be served and appends that hash to the url of the file that the browser requests. This could end up looking like this:

<link href="/Content/css?v=2nL7Ibd5cnOVWsLuIeyRaOhIF5ZHdQKX2-9LZ2jBpuVyS7kMr6iNEf9OqoHSqsfI" rel="Stylesheet" type="text/css" />
<script src="/script/js?v=VR7ocQF24ZSnoI7UDcmoFdsWKYA_9SNYPo4BPWsiNwD4uxScIysQcdzNgMgdioh8 type="text/javascript"></script>

And the beautiful thing is that any time the file changes, the hash will change too. Now you can keep those files cached on the client or anywhere else almost indefinitely and the client will never be stuck with a stale file.

Custom Bundles (I micro-manage my eggs)

You can also create custom bundles and force files to be bundled in order by dropping code such as this into the Application_Start() method inside your global.asax file.

Bundle mobileBundle = new Bundle("~/android", new JsMinify());
mobileBundle.AddFile("~/scripts/mobile/droid.js");
mobileBundle.AddFile("~/scripts/mobile/droid-dropdown.js");
mobileBundle.AddFile("~/scripts/mobile/droid-spin.js");
BundleTable.Bundles.Add(mobileBundle);

It will produce very similar results to the convention-based bundling but with more control for those who need it.

How to get started

At the time of this writing, you can get started with this feature right in Visual Studio 2010 by downloading the MVC 4 beta. In this case you will need to reference the System.Web.Optimization assembly (referenced by default on all MVC 4 projects). I haven’t tested it but I don’t see why you couldn’t do this in an ASP.NET project also.

The other option is to download the full-blown Visual Studio 11 beta. In this case you will still be using the System.Web.Optimization namespace, but this time it is included in v4.5 of System.Web.dll.

If you have any questions let me know, I’ve got some eggs to clean up.

Posted in .NET, MVC | Tagged , , , , | 1 Comment

Scraping Data off the Web

Ever since database-driven applications have been used to serve data on the web, users viewing that data have wanted to get at it programmatically. Whether it’s for automatically scouring an auction site for a hard to find classic Nintendo game or getting weather updates for your custom alarm clock application, you will probably start looking for a way to get that data into your program.

In this blog post, I will cover a basic comparison of browsers vs. scrapers, and then dive into some code to show you how to write a simple scraper to get the rating of a movie from IMDB.

Browsers vs. Scrapers

One important thing to realize is that ultimately, a website is often little more than an HTML formatter sitting on top of a database. When a request is made, the website will query data from a database, possibly manipulate it in some interesting way, and then pass the fields to some sort of HTML formatter (like a view engine). This HTML is then returned to the client’s browser as a response. This diagram shows how data is passed through to the browser in a typical HTTP request/response scenario.

Browser Request and Response IllustrationCompare this to a scraper, which is essentially just an automated browser that doesn’t require human interaction to gather the data from the remote server.

Scraper Request and Response IllustrationAs you can see, the only difference is that the application is creating and sending the requests, and then parsing the meaningful data from the responses and persisting it to some data store (in this case a database).

A Simple Scraper

Before diving into code, you will want to investigate the site you want to scrape and figure out what kind of requests their servers are set up to handle. In this case we’ll be scraping http://www.imdb.com/. One thing to note is that IMDB has an API which is much easier to use than trying to scrape the site, but for illustration purposes we’ll assume that no such API exists.

The first thing we’ll do is go to the IMDB website and search for a movie name, in this case I’ll search for Tron: Legacy by typing it into the search box and hitting enter. Immediately I notice the URL at the top of the page: http://www.imdb.com/find?s=all&q=Tron%3A+Legacy. I know that %3A the URL-encoded way of saying “:”, and + is the same as “ “, so I can clearly see that if I visit the URL “http://www.imdb.com/find?s=all&q=” followed by the URL-encoded name of a movie, I should get the search results page.

Note: For the code below to work we’ll need to reference System.Web (which means targetting the full-blown .NET framework, and not just the client profile). If you are targetting the Client Profile you will not see System.Web as an available reference.

And here’s the code:

static Single GetMovieRating(string movieName)
{
var webClient = new System.Net.WebClient();
string searchUrl = "http://www.imdb.com/find?s=all&q=" + HttpUtility.UrlEncode(movieName);
string searchResponse = webClient.DownloadString(searchUrl);
throw new NotImplementedException("This function isn't done yet!");
}

If you examine the string response, you’ll see that it now contains the HTML returned by IMDB. You’ll also notice that the rating is not on this page, because it’s just a search results page. The next step will be to find the URL of the actual movie page which contains the rating that we’re looking for. We could parse out the DOM into an object and traverse through it, which would probably work pretty well in this case. There are downsides to doing this however: it’s very slow when all we want is a single piece of information, it uses a relatively large amount of memory, and it’s also a lot more complicated (code-wise) to do so. Because of that, we’ll just write a simple regular expression to find what we’re looking for.

You may have noticed that there are a few movies beginning with “1.”, followed by an anchor tag referencing the movie page. Instead of searching for one in a specific category (like Popular Titles, Exact matches, etc.), we’ll just search for the first one we find, just in case the user’s search didn’t give us a result in that category.

We’ll use this regex to do so:

1\..*?href="(?<MovieLink>[^"]*)"

And the following code:

Match movieUrlMatch = Regex.Match(searchResponse, @"1\..*?href=""(?<MovieLink>[^""]*)""");
if (!movieUrlMatch.Success) return 0.0f;
string movieResponse = webClient.DownloadString("http://www.imdb.com" + movieUrlMatch.Groups["MovieLink"].Value);

Now we have a movieResponse string, containing the HTML of the movie’s page. All we need to do now is create another regex to parse the rating, and then return it to the caller.

Match movieRatingMatch = Regex.Match(movieResponse, @"(?<Rating>\d\.\d)/10");
Single movieRating;
if (!movieRatingMatch.Success || !Single.TryParse(movieRatingMatch.Groups["Rating"].Value, out movieRating)) return 0.0f;
return movieRating;

And that’s it! we’ve successfully scraped IMDB to get the rating of a movie. You can now call this function from a GUI or write an app to rename your files by pre-pending the rating to them.

Download source code here.

Upcoming Web Scraping Topics

  • Understanding the HTTP request
  • Tools & Techniques
  • Legal & Ethical Issues
  • Common Problems & Challenges
Posted in Scraping | Tagged , | Leave a comment

jQuery Selectors Tutorial

jQuery is one of those things that sounds scary to most web designers, but really isn’t. It gives you great power over how the page you’re working on not only looks, but how it acts too.

I’m going to assume that if you’ve found this tutorial, you
already know the basics of jQuery but are looking for more control over what’s selected. In this tutorial I will be explaining how selectors work to generate jQuery wrapped sets, and how you can use filters to accomplish thing that basic selectors don’t allow you to do.

If you are at all familiar with CSS, you will immediately notice that many jQuery selectors are the same as the CSS selectors. Because this functionality is provided by jQuery, it does not rely on the browser to support the selector. Great, right? I think so.

Basic Tag Selection

The first thing you typically do when planning your selection is to examine the tags on your page, along with their ID and class. Here is a list of the basic tag selectors and how they may be used:

Basic Selectors
*
Selects every element
$(‘*’) would match every element, regardless of tag
tag name
Selects every element with a name matching tag name
$(‘div’) would match every <div> tag
tag.class
Selects every element with a name matching tag and a class matching class
$(‘div.seperator’) would match every <div class="seperator"> tag
tag#id
Selects every element with a name matching tag and an id matching id
$(‘div#main’) would match every <div id="main"> tag
Selecting by other Attributes
tag[attr]
Selects every element with a name matching tag and with an attribute attr. The attribute must be present, but can have any value
$(‘img[alt]‘) would match every <img alt="">, <img alt="anything">, etc.
tag[attr='abc']
Selects every element with a name matching tag and with an attribute attr that has a value of abc
$(‘label[for="main"]‘) would match every <label for="main"> tag
tag[attr^='abc']
Selects every element with a name matching tag and with an attribute attr that has a value beginning with abc
$(‘img[src^="http://mydomain.com"]‘) would match every <img> with a src that starts with http://mydomain.com
tag[attr$='abc']
Selects every element with a name matching tag and with an attribute attr that has a value ending with abc
$(‘img[src$=".gif"]‘) would match every <img> with a src that ends in .gif
tag[attr!='abc']
Selects every element with a name matching tag and with an attribute attr that does not match abc, OR without an attribute attr
$(‘label[for!="wrapper"]‘) would match every label except for <label for="wrapper">. It would match <label> also.
tag[attr*='abc']
Selects every element with a name matching tag and with an attribute attr containing abc
$(‘a[href*="blog"]‘) would match every <a> tag linking to a url containing the word blog
Combining Selectors (Relational Selectors)
tag descendant
Selects every element with a name matching descendant that is anywhere inside an element with a name matching tag
$(‘div input’) would select any <input> that is anywhere inside a <div> (even if it is deeply buried)
tag, anothertag
Selects every element with a name matching tag or anothertag
$(‘div, span’) would select all elements that are either <div> or <span>
tag > child
Selects every element with a name matching child that is directly inside an element matching tag
$(‘a > img’) would select every <img> directly inside of an <a>
tag + next sibling
Selects every element with a name matching next sibling that comes directly after an element named tag
$(‘label + input’) would select all <input> tags that come directly after a <label>
tag ~ any sibling
Selects every element with a name matching any sibling that comes after an element named tag (within the same parent)
$(‘img ~ a’) would select all <a> that come after an <img> tag and share the same parent

Filters

Filters are another critical piece of the puzzle, and can fill any gaps that the basic selectors may leave. One common example of a filter being used is to select every other row in a table (to alternate colors for example). To do that you could use a selector like $(‘table tr:nth-child(even)’). For this portion of the tutorial I’m going to try something new: a live demonstration of each of the basic filters.

Live jQuery Samples
Basic Filters
Filter Live jQuery Sample Explanation
:first
$('tbody tr:first')
Selects the first TR in the TBODY
:last
$('tbody tr:last')
Selects the last TR in the TBODY
:even
$('tbody tr:even')
Selects every even TR in the TBODY
:odd
$('tbody tr:odd')
Selects every odd TR in the TBODY
:eq(n)
$('tbody tr:eq(6)')
Selects the 6th* TR in the TBODY
:gt(n)
$('tbody tr:gt(6)')
Selects every TR after the 6th*
:lt(n)
$('tbody tr:lt(8)')
Selects every TR before the 8th*
Child Filters
Filter Live jQuery Sample Explanation
:first-child
$('tbody tr td:first-child')
Selects the first child TD of each TR
:last-child
$('tbody tr td:last-child')
Selects the last child TD of each TR
:only-child
$('tbody tr td:only-child')
Selects each TD that has no siblings within it’s TR
:nth-child(n)
$('tbody tr:nth-child(15)')
Selects the 15th* TR in the TBODY
:nth-child(even|odd)
$('tr td:nth-child(even)')
Selects every other TD in each TR (evens)
:nth-child(xn+y)
$('tbody tr:nth-child(4n+1)')
Selects the TR after every 4th TR in the TBODY

* NOTE: The nth-child Filters are 1-based, while the eq, gt, and lt filters are 0-based

Still to come:

  • Selecting by a tag’s current state using filters like :checked, :disabled, :enabled :hidden
  • Selecting by a tag’s type using filters like :button :checkbox etc
  • Negating and other useful filters like :not() :has() :contains(text) :parent :animated
  • Combining Filters

I hope this tutorial was helpful. I’m open to comments & suggestions, and will be doing more like this in the coming weeks.

Posted in jQuery | Tagged , | Leave a comment

Cool CSS Finds

I always somehow end up finding a whole lot of interesting things while browsing the web. While looking for something CSS related earlier I stumbled across this CSS Cheat Sheet, something that I’ve actually had printed out for a long time but completely forgot about until a few minutes ago.

This find got me to thinking; what other great little ‘CSS Helpers’ are out there? Being a C# developer, and not a web designer by any means, I need all the help with CSS that I can get! And that’s what this post is about: CSS tools that us server-side developers can use to get something done when a CSS pro isn’t around. I don’t know about you but I’ve got a lot of personal, “just for fun”, and “nobody outside of the company will ever see this” projects that should look passable at the very least. I’m sure I’m not alone in this.

Free CSS Grid Designers

Another really great find is the YUI CSS Grid Builder, a tool for, you guessed it, designing grids in CSS with divs! This is something I always have trouble with, and the last thing I want to do is use tables for my layout, because then I get dirty looks from whoever has to clean it up later.

A similar tool that’s more for building content pages than master pages is this CSS Grid Designer. It isn’t really set up to do the whole Header/Sidebar/Content/Maybe Footer thing, but it does really well when it comes to putting some columns on a page and making it look nice. The CSS it generates looks a bit scary to me, but it’s still a great tool in a pinch.

Validation – Making sure you did it right

The last thing to do before you hit that publish button and deploy your work of art into your company’s network, is to check and see if your CSS validates. The W3C providers this free CSS Validation Service.  This can be useful for multiple reasons, including reducing the chance that your page will look consistent across browsers. At the very least if you mess something up it will help teach you good practices.

Posted in CSS | Tagged , , | Leave a comment