It is very common for us to visualize our results using a map. We can put the data into the map to illuminate the audiences. In this post, I will talk about the basic packages and functions that are used to draw a map in R.
Note: All the required material used in this series of post are posted on my github account. You can easily download them here.
Map and mapdata: the packages of pre-defined maps
When we are thinking of maps, what comes to our first thought is that: is there a map that I can use directly? The answer is, yes, and no. By saying yes, I say that there are many maps that can be easily drawn with the help of package “maps” and “mapdata”. By saying no, it means that many maps in these packages are out-dated. If you want a concurrent one, you have to draw it all by yourself.
Map of the world
Map of the United States
First let’s draw the whole map of United States.
Then what if we only need some states? It can be done by the region option:
Map of China
There is no map of China in the package of maps, so we need another package: mapdata.
Here you can see this map is out-dated: the city of Chongqing is still in the province of Sichuan. Then we may need another way to plot the map.
MapTools – A powerful tool to parse shapefile
The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software.
There are many ways in R that can read the shapefile, including the package of “maptools”, “rgdal”, “PBSmapping”, “shapefile” and so on. In this post, I will focus on maptools. The format of shapefile is very convenient to find, and here is a resource: GADM database of Global Administrative Areas, and you can find the shapefile for every country and every region in the world.
We will walk through the shapefile property and discuss how to deal with it.
Read and parse
There is a generic function
readShapeSpatial that can read all three types of information which are included in shapefile: point, line and space. Of course for these types, there are three different functions designed to handle only one type:
readShapePoly. Now we are certain about our datatype, then we will use
The R object
china_map1 is very complex, which can be seen by the
str() function. Indeed, It belongs to the class of SpatialPolygonsDataFrame:
There are 925 polygons, and R just follow the polygons defined by this object then line them each one up. It is confused that there are 33 provinces in China (including HongKong), but the polygons are much more. The reason is that some provinces have islands, for example, Shanghai, then there are more than 1 polygon for it.
It is not a data frame, but the operator “\$” and “” are reloaded in this particular type, then we can use “\$” and “” to subset parts of it as we want similar to what we do in a data frame. If we want to see the data hidden in SpatialPolygonsDataFrame, we can use the operator “@”:
Here we can see that china_map2 consists the 925 polygons, while china_map3 expand these 925 polygons into points. The underlying logic is the points belonging to the same polygon are in the same group, and R line all the points up to draw the polygon.
But there is a small problem here. As you see, the name variable in china_map2 is unreadable. It is because the different encoding format. The solution is
Draw the map
It is time to draw!
We can see now the map for China is up-to-data: Chongqing is added!
We can quickly plot a province by referring to its NAME or ADCODE99 property:
As you can see here, there are 12 polygons in Shanghai.
Now we have the basic idea of how map works in R, in the next post we will draw the maps with ggplot.