Thursday, 6 June 2013

Box-plot with R – Tutorial



Yesterday I wanted to create a box-plot for a small dataset to see the evolution of 3 stations through a 3 days period. I like box-plots very much because I think they are one of the clearest ways of showing trend in your data. R is extremely good for this type of plot and, for this reason, I decided to add a post on my blog to show how to create a box-plot, but also because I want to use my own blog to help me remember pieces of code that I might want to use in the future but that I tend to forget.
For this example I first created a dummy dataset using the function rnorm() which generates random normal-distributed sequences. This function requires 3 arguments, the number of samples to create, the mean and the standard deviation of the distribution, for example: 

rnorm(n=100,mean=3,sd=1)

This generates 100 numbers (floats to be exact), which have mean equal to 3 and standard deviation equal to 1.
To generate my dataset I used the following line of code:

data<-data.frame(Stat11=rnorm(100,mean=3,sd=2),
Stat21=rnorm(100,mean=4,sd=1),
Stat31=rnorm(100,mean=6,sd=0.5),
Stat41=rnorm(100,mean=10,sd=0.5),
Stat12=rnorm(100,mean=4,sd=2),
Stat22=rnorm(100,mean=4.5,sd=2),
Stat32=rnorm(100,mean=7,sd=0.5),
Stat42=rnorm(100,mean=8,sd=3),
Stat13=rnorm(100,mean=6,sd=0.5),
Stat23=rnorm(100,mean=5,sd=3),
Stat33=rnorm(100,mean=8,sd=0.2),
Stat43=rnorm(100,mean=4,sd=4))

This line creates a data.frame with 12 columns that looks like this:



Stat11
Stat21
Stat31
Stat41
Stat12
Stat22
Stat32
Stat42
Stat13
Stat23
Stat33
Stat43
5
2
9
-3
10
4
1
1
4
1
5
9
6
13
8
3
7
3
10
10
10
5
9
8
4
4
6
0
10
6
7
6
6
8
2
7
6
7
6
3
9
1
7
0
1
0
6
0
0
2
8
1
6
8
0
8
3
10
9
8
0
19
10
0
11
10
5
6
5
8
10
1
7
4
5
-5
7
0
3
5
2
5
5
3
4
12
9
-4
7
1
9
0
7
2
1
7
7
3
9
0
11
0
8
1
7
0
7
7
6
19
8
3
10
10
9
6
0
2
8
2
6
13
6
-5
12
8
1
4
0
4
5
10
8
11
6
-1
11
4
4
1
4
6
6
10
8
13
5
-5
7
10
0
4
2
7
3
1
2
8
5
-2
5
7
4
2
7
0
3
1
8
11
7
3
11
1
0
9
2
3
5
8
4
19
5
-1
11
6
3
4
9
5
9
0
2
9
5
-3
12
7
6
4
8
2
6
8
7
10
5
-4
8
9
6
9
1
4
3
4






As I mentioned before, this should represent 4 stations for which the measure were replicated in 3 successive days.
Now, for the creation of the box-plot the simplest function is boxplot() and can be simply called by adding the name of the dataset as only argument:

boxplot(data)

This creates the following plot:

 It is already a good plot, but it needs some adjustments. It is in black and white, the box-plots are evenly spaced, even though they are from 3 different replicates, there are no labels on the axis and the names of the stations are not all reported.



So now we need to start doing some tweaking.
First, I want to draw the names of the stations vertically, instead of horizontally. This can be easily done with the argument las. So now the call to the function boxplot() becomes:

boxplot(data, las = 2)

This generates the following plot:




Next, I want to change the name of the stations so that they look less confusing. For doing that I can use the option names:

boxplot(data, las = 2, names = c("Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4"))


which generates this plot:

 




If the names are too long and they do not fit into the plot’s window you can increase it by using the option par:

boxplot(data, las = 2, par(mar = c(12, 5, 4, 2) + 0.1), names = c("Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4"))




Now I want to group the 4 stations so that the division in 3 successive days is clearer. To do that I can use the option at, which let me specify the position, along the X axis, of each box-plot:

boxplot(data, las = 2, at = c(1,2,3,4, 6,7,8,9, 11,12,13,14), par(mar = c(12, 5, 4, 2) + 0.1), names = c("Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4"))

Here I am specifying that I want the first 4 box-plots at position x=1, x=2, x=3 and x=4, then I want to leave a space between the fourth and the fifth and place this last at x=6, and so on.



If you want to add colours to your box plot, you can use the option col and specify a vector with the colour numbers or the colour names. You can find the colour numbers here, and the colour names here



Here is an example:

boxplot(data, las = 2, col = c("red","sienna","palevioletred1","royalblue2","red","sienna","palevioletred1", 
"royalblue2","red","sienna","palevioletred1","royalblue2"),
 at = c(1,2,3,4, 6,7,8,9, 11,12,13,14), par(mar = c(12, 5, 4, 2) + 0.1)
 names = c("Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4"))





Now, for the finishing touches, we can put some labels to plot.
The common way to put labels on the axes of a plot is by using the arguments xlab and ylab.
Let’s try it:



boxplot(data, ylab = "Oxigen (%)", xlab = "Time", las = 2, col = c("red","sienna","palevioletred1","royalblue2","red","sienna","palevioletred1","royalblue2","red","sienna","palevioletred1","royalblue2"),at = c(1,2,3,4, 6,7,8,9, 11,12,13,14), par(mar = c(12, 5, 4, 2) + 0.1), names = c("Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4","Station 1","Station 2","Station 3","Station 4"))


I just added the two arguments highlighted, but the result is not what I was expecting 



As you can see from the image above, the label on the Y axis is place very well and we can keep it. On the other hand, the label on the X axis is drawn right below the stations names and it does not look good.
To solve this is better to delete the option xlab from the boxplot call and instead use an additional function called mtext(), that places a text outside the plot area, but within the plot window. To place text within the plot area (where the box-plots are actually depicted) you need to use the function text().
The function mtext() requires 3 arguments: the label, the position and the line number.
An example of a call to the function mtext is the following:

mtext(“Label”, side = 1, line = 7)

the option side takes an integer between 1 and 4, with these meaning: 1=bottom, 2=left, 3=top, 4=right

The option line takes an integer with the line number, starting from 0 (which is the line closer to the plot axis). In this case I put the label onto the 7th line from the X axis.

With these option you can produce box plot for every situation.

The following is just one example:

This is the script:


data<-data.frame(Stat11=rnorm(100,mean=3,sd=2), Stat21=rnorm(100,mean=4,sd=1), Stat31=rnorm(100,mean=6,sd=0.5), Stat41=rnorm(100,mean=10,sd=0.5), Stat12=rnorm(100,mean=4,sd=2), Stat22=rnorm(100,mean=4.5,sd=2), Stat32=rnorm(100,mean=7,sd=0.5), Stat42=rnorm(100,mean=8,sd=3), Stat13=rnorm(100,mean=6,sd=0.5), Stat23=rnorm(100,mean=5,sd=3), Stat33=rnorm(100,mean=8,sd=0.2), Stat43=rnorm(100,mean=4,sd=4)) boxplot(data, las = 2, col = c("red","sienna","palevioletred1","royalblue2","red","sienna","palevioletred1","royalblue2","red","sienna","palevioletred1","royalblue2"), at = c(1,2,3,4, 6,7,8,9, 11,12,13,14), par(mar = c(12, 5, 4, 2) + 0.1), names = c("","","","","","","","","","","",""), ylim=c(-6,18)) #Station labels mtext("Station1", side=1, line=1, at=1, las=2, font=1, col="red") mtext("Station2", side=1, line=1, at=2, las=2, font=2, col="sienna") mtext("Station3", side=1, line=1, at=3, las=2, font=3, col="palevioletred1") mtext("Station4", side=1, line=1, at=4, las=2, font=4, col="royalblue2") mtext("Station1", side=1, line=1, at=6, las=2, font=1, col="red") mtext("Station2", side=1, line=1, at=7, las=2, font=2, col="sienna") mtext("Station3", side=1, line=1, at=8, las=2, font=3, col="palevioletred1") mtext("Station4", side=1, line=1, at=9, las=2, font=4, col="royalblue2") mtext("Station1", side=1, line=1, at=11, las=2, font=1, col="red") mtext("Station2", side=1, line=1, at=12, las=2, font=2, col="sienna") mtext("Station3", side=1, line=1, at=13, las=2, font=3, col="palevioletred1") mtext("Station4", side=1, line=1, at=14, las=2, font=4, col="royalblue2") #Axis labels mtext("Time", side = 1, line = 6, cex = 2, font = 3) mtext("Oxigen (%)", side = 2, line = 3, cex = 2, font = 3) #In-plot labels text(1,-4,"*") text(6,-4,"*") text(11,-4,"*") text(2,9,"A",cex=0.8,font=3) text(7,11,"A",cex=0.8,font=3) text(12,15,"A",cex=0.8,font=3)
 
 
 
 
 





Tuesday, 4 June 2013

IntR - Interactive GUI for performing geostatistical analysis in R

In 2011 I presented at the UseR conference, held in Warwick (UK), a piece of software for easing the learning curve of R for geostatistical analysis.
It is a very simple attempt to create an interactive interface in R, by using Python as GUI. From the GUI the user can easily create a script to perform some basic geostatistical analysis, such as ordinary kriging and others.
It should be fairly easy to use, even though the user needs to have some basic knowledge of GIS layers and geostatistics in order to be successful. On the other hand, I know a lot of users that are approaching R and perhaps GIS for the first time, and I think this software can be of great help in particular for these users. After a quick phase in which they need to learn some basics, such as what a ASCII grid is, I think this software should ease the learning curve of R itself, which according to different books is pretty steep.
The software is available from here: IntR - Home Page
Here you can find two versions of IntR, one for 2D geostatistics and the other for 3D stuff, plus a sample dataset and manuals.
It is completely free and you can try it and use it as long as you want, you can also modify its source code if you want to improve its functionalities. However, consider it to be in a beta-testing phase so always check its output and be careful in relying of it. I do not accept any responsibilities if the software does not work as expected or fails to produce a correct output.

Below, I attached an image the original abstract with the work I presented at the UseR conference of 2011 (the pdf is available here at page 91):



Below I put the video of the power point presentation I gave in Warwick, plus a short video tutorial (better to watch them at 720p)regarding the use of IntR for Ordinary kriging. You should also take a look at the IntR manual for more info on how to use it.





 

Monday, 29 October 2012

Terrain Attributes with the raster package

Terrain attributes can be derived by elevation values on a small neighbourhood of each DEM point.
The major attributes can be calculated from the derivatives of the topographic surface. These derivatives measure the rate of change in elevation in relation with the point location.
In literature are available several methods to compute these derivatives.
In this function the following methods have been implemented:

Evans (1980) - In this method a quadratic surface is fitted on a 3x3 elevation window and the derivatives are calculated using all the 9 cells in the windows.

Zevenbergen and Thorne (1987) - Here a more complex Lagrange polynomial is fitted to the topographic surface.

Moore et al. (1993) and Shary (1995), which are similar to the previous two. For more info refer to Florinsky (1998).

This function works with the package raster.
It takes 3 arguments:
data:  the raster object
attr:  the terrain attribute to be computed. At the moment the choice is between: "slope", "aspect", "plan.curvature" and "prof.curvature".
method: one of the four method implemented: The choice is between: "evans", "zev.tho", "moore" and "shary".

Ex.

DTM<-raster("c:/raster.asc")
slope<-DEMderiv(DTM,"slope","evans")

The output is a raster object.

This is the code:

DEMderiv Function

Wednesday, 21 September 2011

Variogram fit with RPanel

During the UseR 2011 conference I saw lots of examples of the use of RPanel to create a GUI in R. Yesterday, because I was a bit bored of the work I was doing I started thinking about this and I decided to try this package.

My objective was to create a new panel with all the main setting for fitting a variogram model to an omnidirectional variogram with gstat.

This is the script:

library(rpanel)
library(gstat)

data<-read.table("data.txt",sep="",header=T)
coordinates(data)=~Lat+Lon

##Variogram Fitting
variogram.plot <- function(panel) {
with(panel, {
variogram<-variogram(Oxigen~1,data,cutoff=Cutoff)
vgm.var<-vgm(psill=Sill,model=Model,range=Range,nugget=Nugget)
#fit<-fit.variogram(variogram,vgm.var)
plot(variogram$dist,variogram$gamma,xlab="Distance",ylab="Semivariance")
lines(variogramLine(vgm.var,maxdist=Range))
})
panel
}

var.panel <- rp.control("Variogram",Sill=20,Range=250,Nugget=0,Model="Mat",Cutoff=250)
rp.listbox(var.panel,Model,c("Mat","Sph","Gau"))
rp.slider(var.panel, Cutoff, 0,500,showvalue=T)
rp.slider(var.panel, Sill, 0,500,showvalue=T)
rp.slider(var.panel, Range, 0,1000,showvalue=T)
rp.slider(var.panel, Nugget, 0,15,showvalue=T)
rp.button(var.panel, title = "Fit", action = variogram.plot)


At this address you can find a zip file with a sample dataset that you can use to try this script, however if you know a bit of gstat you can start customizing it straigth away:


This is the screenshot from my R Console:




I also tried to embed the variogram plot into the panel in order to execute the script in batch mode, this is the result:


if(print(require(tcltk))==FALSE){install.packages("tcltk",repos="http://cran.r-project.org")}
if(print(require(tcltk))==TRUE){require(tcltk)}

if(print(require(rpanel))==FALSE){install.packages("rpanel",repos="http://cran.r-project.org")}
if(print(require(rpanel))==TRUE){require(rpanel)}

if(print(require(gstat))==FALSE){install.packages("gstat",repos="http://cran.r-project.org")}
if(print(require(gstat))==TRUE){require(gstat)}



data<-read.table(tclvalue(tkgetOpenFile()),sep="",header=T)
coordinates(data)=~Lat+Lon

grid <- spsample(data, cellsize = 10, type = "regular")
gridded(grid) <- TRUE


##Variogram Fitting
variogram.plot <- function(panel) {
with(panel, {
variogram<-variogram(Oxigen~1,data,cutoff=Cutoff)
vgm.var<-vgm(psill=Sill,model=Model,range=Range,nugget=Nugget)
#fit<-fit.variogram(variogram,vgm.var)
plot(variogram$dist,variogram$gamma,xlab="Distance",ylab="Gamma")
lines(variogramLine(vgm.var,maxdist=Range*2))
})
panel
}

replot.smooth <- function(object) {
rp.tkrreplot(var.panel, plot)
object
}


var.panel <- rp.control("Variogram",Sill=20,Range=250,Nugget=0,Model="Mat",Cutoff=250,size=c(800,600))
rp.listbox(var.panel,Model,c("Mat","Sph","Gau"))
rp.slider(var.panel, Cutoff, 0,sqrt(areaSpatialGrid(grid)),showvalue=T)
rp.slider(var.panel, Sill, 0,var(data$Oxigen)*2,showvalue=T)
rp.slider(var.panel, Range, 0,sqrt(areaSpatialGrid(grid)),showvalue=T)
rp.slider(var.panel, Nugget, 0,var(data$Oxigen),showvalue=T)
rp.button(var.panel, title = "Fit",action=replot.smooth)
rp.tkrplot(var.panel, plot,variogram.plot)
rp.block(var.panel)


It probably need some more work in order to be perfect but at the moment I'm not interested in a perfect automatic script.
If someone has time to spend on it and is able to made it perfectly automatic for every data file, please share the script with the community.

By the way, I also implemented a line from the tcltk package for the selection of the data file interactively.
This is the resulting panel:

Wednesday, 13 April 2011

CoKriging with gstat - Videotutorial

This is the last lesson of the R Videotutorial for spatial statistics.
It is all about cokriging in gstat. For this lesson I used the meuse dataset, available within gstat, for the references to this dataset take a look at the script.

The videotutorial is available at this link:
Lesson 6

The script and the dataset for the lesson are available here:
Script


P.S.
Thanks to Gerald that with his email remembered me about this lesson, which I uploaded to the site several days ago but I forgot to mentioned in the blog.

Tuesday, 12 April 2011

Box Plot with ggplot2

Hi,
in these days I'm creating lots and lots of box plot with ggplot2.
The look of them is really good and you can change every bit of code so that you can customize the plot completely.
Here is the code I'm using with a test data file to try it:
BoxPlot.zip

The result looks like this: