Python: Mapping 60 years of aviation crashes
Where to begin? I am going to skip talking about the exactly how everything was done. For that you can look at both URL to Lib and Data to Lists. They are not the cleanest of scripts (made first year of python experience) but they get the job done and I believe pretty well documented. Instead I will go over some issues I had and then at the end reward you with some graphs of aviation crashes from the last 48 years.
The first issue was that NTSB.gov documents crashes in a very roundabout way. They do not use Lat Lon for location of crashes. Instead they use City State(USA) or City Country for outside US. Not only that but lots of their Locations said things like “Pacific Ocean”… How am I supposed to use that? Anyways this meant that after I had the 586 files for every month I then had to take the Location information and turn it in to Lat Lon. This was a very slow process, getting data for 156,000 crashes over 48 years. In the end I finished up with a list containing the crash Date, Lat, Lon, City/State, Aircraft License Number, and Fatalities. Now I am sure that I could put this all in to one file but I felt it made more sense to separate crashes by Month and Year.
The next step was thinking about how to visualize all this information. First I visualized every crash… this resulted in a mess of information. North America was under a sea of white dots. Then I realized I would have to take a bit of a morbid route. Only graph crashes that sadly resulted in a death. This helped the situation tremendously… but caused a new issue. There were only 6,000 crashes outside of the US for the 48 years. Meaning 10 crashes per year… so once a month you would see a blip in Russia or Australia, all the while USA has tons of information being displayed but you can barely see it. This led me to drop the idea of showing Earth and I just focused in on North America. The end format is still under consideration, I may get rid of Alaska, or tear Alaska off from Canada and place it somewhere in the Pacific Ocean, allowing me to zoom in even more.
Those are a few of the issues this project brought. Overall it was a fun project and took lots of problem solving. Now time for some rough drafts of my graphs. These graphs do a pretty good job at displaying the data but I think some math needs to be displayed along with the graphs. For instance, July looks like a dangerous month to fly, but this data may be skewed by the fact July has a larger volume of air traffic.
A graph showing incidents by month. This is a collection of the 48 years, not a snapshot of a single year.
This is the same graph as above, but now you also see the death toll for each month. Pretty gruesome, and honestly not the numbers I was expecting. This graph does however support what I stated earlier about the data being misleading. You can see June has the most crashes but Nov, Dec, and Jan are most likely the deadliest times to go in the air(statistically). The amount of info on this graph may be a bit too much
Crashes separated by day of the week they appeared on.
This Graph shows how you can have a lot of information but not display anything meaningful. What you are looking at is the number of crashes accumulated over 48 years. But the Y-axis has such a large max value the information gets lost. (The X-Axis is time)
Honestly disregard this graph, nothing meaningful is represented by this method.
The blue line is documenting # of non fatal crashes per year while the red line documents # of fatal crashes per year.
And now for an ugly pie chart, I am still not sure of the best way to show this. What you are looking at is crash locations over the 48 years. One thing that you can see in this graph is the small amount of crashes outside the USA. That is at the 1 o’clock position, labeled “Outside USA”
I kind of just like the way this looks, and it re-enforces the problem I had at the start. There is not enough crash data “Outside USA” to allow for a full Globe of the earth to be visualized and not be boring.