Zhen Zhang's Blog

R map part 1 -- introduction

It is very common for us to visualize our results using a map. We can put the data into the map to illuminate the audiences. In this post, I will talk about the basic packages and functions that are used to draw a map in R.

Note: All the required material used in this series of post are posted on my github account. You can easily download them here.

Map and mapdata: the packages of pre-defined maps

When we are thinking of maps, what comes to our first thought is that: is there a map that I can use directly? The answer is, yes, and no. By saying yes, I say that there are many maps that can be easily drawn with the help of package “maps” and “mapdata”. By saying no, it means that many maps in these packages are out-dated. If you want a concurrent one, you have to draw it all by yourself.

Map of the world

1
2
library(maps)
map("world", fill = T, col = rainbow(15), mar = c(0, 0, 0, 0))

world

Map of the United States

First let’s draw the whole map of United States.

1
map("state", fill = TRUE, col = rainbow(20), mar = c(0, 0, 2, 0))

us1

Then what if we only need some states? It can be done by the region option:

1
map("state", region = c("california", "nevada", "arizona"), fill = TRUE, col = rainbow(3))

us2

Map of China

There is no map of China in the package of maps, so we need another package: mapdata.

1
2
library(mapdata)
map("china", col = "gray40", ylim = c(18,54))

china1

Here you can see this map is out-dated: the city of Chongqing is still in the province of Sichuan. Then we may need another way to plot the map.

MapTools – A powerful tool to parse shapefile

Introduction

The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software.

There are many ways in R that can read the shapefile, including the package of “maptools”, “rgdal”, “PBSmapping”, “shapefile” and so on. In this post, I will focus on maptools. The format of shapefile is very convenient to find, and here is a resource: GADM database of Global Administrative Areas, and you can find the shapefile for every country and every region in the world.

We will walk through the shapefile property and discuss how to deal with it.

Read and parse

There is a generic function readShapeSpatial that can read all three types of information which are included in shapefile: point, line and space. Of course for these types, there are three different functions designed to handle only one type: readShapePoints, readShapeLines and readShapePoly. Now we are certain about our datatype, then we will use readShapePoly.

1
2
library(maptools)
china_map1 <- readShapePoly("mapsData/maps/bou2/bou2_4p.shp")

The R object china_map1 is very complex, which can be seen by the str() function. Indeed, It belongs to the class of SpatialPolygonsDataFrame:

1
2
3
4
5
6
class(china_map1)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
length(china_map1)
## [1] 925

There are 925 polygons, and R just follow the polygons defined by this object then line them each one up. It is confused that there are 33 provinces in China (including HongKong), but the polygons are much more. The reason is that some provinces have islands, for example, Shanghai, then there are more than 1 polygon for it.

It is not a data frame, but the operator “\$” and “[]” are reloaded in this particular type, then we can use “\$” and “[]” to subset parts of it as we want similar to what we do in a data frame. If we want to see the data hidden in SpatialPolygonsDataFrame, we can use the operator “@”:

1
2
3
4
5
china_map2 <- china_map1@data
head(china_map2)
library(ggplot2)
china_map3 <- fortify(china_map1) ## fortify() is a function in the package of ggplot2, we will talk about it later
head(china_map3)

Here we can see that china_map2 consists the 925 polygons, while china_map3 expand these 925 polygons into points. The underlying logic is the points belonging to the same polygon are in the same group, and R line all the points up to draw the polygon.

But there is a small problem here. As you see, the name variable in china_map2 is unreadable. It is because the different encoding format. The solution is iconv():

1
china_map2$NAME <- iconv(china_map2$NAME, from = 'GBK') ## The original coding format is GBK

Draw the map

It is time to draw!

1
plot(china_map1)

china2

We can see now the map for China is up-to-data: Chongqing is added!

We can quickly plot a province by referring to its NAME or ADCODE99 property:

1
2
Shanghai = china_map1[china_map1$ADCODE99 == 310000,]
plot(Shanghai)

Shanghai

As you can see here, there are 12 polygons in Shanghai.

Now we have the basic idea of how map works in R, in the next post we will draw the maps with ggplot.

References

  1. Douban R by shilululululu
  2. 用R软件绘制中国分省市地图 by 邱怡轩
  3. 用ggmap包进行地震数据的可视化 by 肖凯
  4. R绘制中国地图,并展示流行病学数据 by 姜晓东