July 26, 2009
Now that we have a method of predicting the delay in an individual flight leg, we need to start looking for the best routes between two airports. To do this, we will model the flight network as a graph (see “Breadth-first search” below), and assign flight arcs weights representing delay. We can now solve what is known as a shortest path problem on the network, where we attempt to determine the path between two nodes that has the least total weight, or delay. An example of a shortest path problem is determining which roads to take to go between two places while minimizing the total distance traveled; alternatively, you could attempt to navigate between the same two places while minimizing total travel time. In the first case, you would model the road network as having weights of distance; in the second, as having weights composed of distance / speed limit.
Once we have constructed our network with weights from our statistical model, we need to determine which sequence of flights to take to minimize the delay between two points. One method would be to enumerate all possible paths, calculate the total delay along each path, and choose the smallest one. However, this method runs into serious problems for all but the most trivial graphs; for even modestly sized graphs the number of paths quickly grows beyond the realm of computational feasibility.
To overcome this, we use an efficient implementation of Dijkstra’s algorithm. Dijkstra’s can solve the shortest path problem on the graphs we generate in mere hundredths of seconds. After applying Dijkstra’s to our network, we are left with a single shortest path. By generating many networks (remember that each network is generated pseudo-randomly) and solving the shortest path problem on each, the result is a collection of paths that all, at some time, can be considered shortest paths. We then plot these paths, representing the average delay as color and the number of times it was selected as a shortest path by the width of each arc. Below is an example of the result of our algorithms.

Above: Shortest paths from Boston to Los Angeles flying with American Airlines on a Monday. The legend’s units are delay in minutes.
2 Comments |
research, Uncategorized |
Permalink
Posted by Pat
July 26, 2009
The next step in our algorithm is to find a method to predict how much delay a given flight will have. One solution is to look through all previous flights with similar characteristics as the flight in question, and use the delay from those flights as an estimate. However, we are dealing with very large data sets; for example, in 2002 American airlines had over 800,000 flights. That’s just one year, and one carrier; we’re analyzing 7 different carriers across 9 years. To parse all this data would be rather time consuming, and so we utilize sampling. With sampling, we choose a number of flights randomly from the whole data set, and use just these flights as predictors.
To refine this technique, we characterize flights on a number of parameters, such as the takeoff time, the carrier, or the day of the week. However, even with these improvements, certain effects in the network are ignored, such as propagated delay. If the first leg of a flight is delayed, its most likely the case that the second leg is, too. However, with sampling we could very easily choose an on time flight to follow a horribly delayed one. To attempt to model these effects, we utilize ‘dependent sampling.’
With dependent sampling, rather than simply choosing every flight leg randomly, we instead begin by selecting all flights leaving the origin airport. We then choose only connecting flights that occurred both on the same day as the first leg and that took off after the first leg arrived. This way, historical trends of delay propagation are included in our model.
No Comments » |
research |
Permalink
Posted by Pat
July 7, 2009
The first step in our shortest path algorithm is to generate a network map of all available flights. Given an airline, we need to determine how many flights it will take to reach any other airport in the country. To do this, we use a breadth-first search algorithm.
Breadth-First Search
Given some network of nodes and arc, or ‘graph’, we want to assign every node a numeric value representing the minimum number of arcs that must be traversed from our source node to arrive at that node. We do this by searching ‘breadth-first’ instead of ‘depth-first’. This means that we will not ‘explore’ a node of depth n+1 until all nodes of depth n have first been explored. This process is demonstrated in the image below:

As can be seen, once the search is completed each node has a number associated with it representing the minimum number of arcs that must be traversed to get to it. When this is actually implemented, another value is stored at each node: the node’s parent. A node’s parent is the node that was responsible for ‘discovering’ it, and so by continuously traveling to parent nodes we will eventually return to our origin node, allowing us to find the shortest path between two nodes.
We are dealing with a network of about 3,700 airports, and each of those airports flies to dozens of other airports, and so our graph is fairly large. By utilizing the breadth-first search algorithm, we are able to restrict our attention to a much smaller network of, say, only airports within two hops of our origin.
No Comments » |
research |
Permalink
Posted by Pat
April 16, 2009
Hello!
My name is Pat Steele, and I’ve received a Chappell grant to work with professors David Phillips and Tanujit Dey of the mathematics department here at the College. Our goal is to develop an understanding of and to create statistical models of flight delay in the U.S. airline industry. We are attending a conference in August to present our findings, under the following abstract:
“A common way to arrange flight travel is to use one of the many online travel search engines. While such search engines can provide flights of the cheapest cost and even minimize the number of transfers, they do not consider the probability of delay on the flights they suggest. Using the extensive data available, we design and implement algorithms that present flight suggestions with a smaller probability of delay based on an origin-destination input. The probabilities calculated are based on our statistical models and analyses. We represent both the different flight suggestions and delay analyses on an airport network map.”
I’m currently developing a Java web applet to display the models we are developing; the goal of the applet is to graphically represent flight delay patterns and predictions, as well as allow users to query our algorithm for flights they are considering. Over the summer I will be working more directly on developing algorithms and writing code for the project.
No Comments » |
research | Tagged: research |
Permalink
Posted by Pat