Friday, 28 November 2014

R Object-oriented Programming - Book Review



I have been asked to review the book “R Object-oriented Programming” by Kelly Black, edited by Packt publishing (£14.45 for the E-Book, £27.99 for Print+E-Book).

The scope of the book is “to provide a resource for programming using the R language” and therefore it can be seen as a good and practical introduction to all the most commonly used part of R. The first 2 chapters deal with data type and data organization in R. They basically quickly review how to handle each type of data (such as integers, doubles) and how to organize them into R objects. The third chapter deals with reading data from files and save them. This chapter gives a pretty good introduction into reading and writing every sort of data, even binaries, and from a variety of sources, including from the web. Chapter 4 provides an introduction to R commands to generate random numbers, in particular it gives a thorough overview of the sample command. Chapters 5 and 6 give a good background into the use of R to manipulate string and time variables. Of particular interest throughout the book is the handling of data gathered from public source on the web. For these particular data skills in string manipulation become crucial both for handling web addresses and also to extract the actual data from the information returned by the server. For this reason I think this book does a good job in introducing these important aspect of the R language.
Chapter 7 introduces some basic programming concepts, such as if statements and loops. Chapters 8 and 9 provide a complete overview of the S3 and S4 classes and finally chapters 10 and 11 are two hands on examples on how to put together all the concepts learned in the book to solve very practical problems. In these example the reader will be guided towards the creation of powerful R programs to grade student and perform a Monte Carlo simulation.
The book is written in a very practical form, meaning that not much time is wasted explaining each function in details, readers can browse the help pages of each function for more details. This means that probably this book is not for newbies to programming languages. Most of the learning is done by exploring the lines of code provided and for this reason I think the best readers would be people familiar with a programming language, even though I do not think that readers necessarily needs some familiarity with R. However as stated on the website, the target for this book are beginners who wants to become more “fluent” with the language.
Overall, I think this book does a good of providing the reader with a strong and neat introduction to all the bits of coding required to become more comfortable writing advance scripts. For example, at the end of chapter 2 the author discuss the use of the apply set of commands. These are crucial milestone to be learned for every individual who wants to switch from a mundane use of R to a more advanced and rigorous use of the language. In my personal experience when I began using R I would often create very long script using lots of loops and if statements, which tends to greatly decrease the execution speed. As soon as I learned to master the apply set of commands I was able to reduce my code and crucially I was also able to substantially increase its executing speed. Personally I would have loved to have access to such a book back then! The use of web sources for data manipulation is also a very nice addition that as far as I know is not common in other introductory texts. Nowadays gathering data from the web has become the norm and therefore I think it is important to provide beginners with tools to handle these type of data.
The strength of this book however is in chapters 8 and 9, which provide an extensive introduction to the use of the classes S3 and S4. I think these two chapters alone would justify the price for buying it. As far as I know these concepts are generally not treated with the right attention in books for beginners. They may explain you that when you load a package then the functions you normally use, such as plot, may change their function and options. However, I never found an introductory book that provides such as exhaustive explanation of how to fully control these classes to create advance programs. Of particular interest are also the two examples provided in Chapters 10 and 11. These are practical exercises that put together all the concepts learned in the previous chapters with the purpose of creating R programs that can be easily implemented and share. Chapter 10 for example describe a neat and powerful way to create a new R program to grade students. In this chapter the reader will use all the basic programming concept learned during the course of the book and he/she will put them together for creating an R program to import grades from csv files, manipulate them and create summary statistics and plot.
In conclusion, I see a variety of uses for this book. Clearly it is targeted to post beginners who need a short way to unlock the full power of R for their daily statistical routines. However, this book does not loose its purpose after we learned to properly use the language. It is written in such a way that even for experienced R users it is a useful way to quickly look-up functions and methods that maybe they do not use very often. I sometimes forget how to use certain functions and having such a book on my office bookshelf will certainly help me in these frustrating situations. So I think it will become part of the set of references that future R user will use on a regular basis.