Being Open source and free, lot of interesting development has happened in R, that help us to perform data analysis quickly. R has many excellent library that can put most costly commercial products to shame. R is powerful, though not very easy to learn. There are too many choices in R and one need to devote considerable time to understand what works best for a particular situation.
In this blog, we will see how to read data from a RDBMS – MySQL. Perform some amount of data manipulation to generate reports. We will also present data in Geo-spatial reports in addition to standard charts.
For our tutorial, we use month-wise sales data at dealer level. We also have address of dealers in our database that we will leverage to generate sales data at district level. We have this data in a relational database – MySQL. R has library to extract data from most standard data bases. Here to access this data we need to first connect R with the database. This require RMySQL package, which is currently not installed in my machine. I need to get this package. So I give a command in R:
install.packages(“RMySQL”)
It gives me a list of server to choose form. I select the closest server in India, about a minute later I get confirmation that the package is downloaded but it has failed with an error. A quick look in the terminal window tells me that it could not find mysqlclient-dev library. This error is kind of expected in a Linux system. Unlike windows, linux machines compile the code in local machine. I have not installed mysql client programs in my machine, all I have is a server. Naturally RMySQL installation fails. Error message gives me the way out is to install libmysqlclient-dev or libmariadbclient-dev. I install this library with in Ubuntu:
sudo apt-get install libmysqlclient-dev
and then again try installing RMySQL. This time installation succeeds without any problem.
Note: There are some alternate remedy suggested like installing r-client-dev of libmariadbclient-dev. These solutions do not work, at least in my Ubuntu 14.10 machine. So stick to the suggestion pointed in error message.
library(DBI)
# Connect to my-db as defined in ~/.my.cnf
con <- dbConnect(RMySQL::MySQL(), group = "my-db")
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
dbListTables(con)
dbListFields(con, "mtcars")
dbReadTable(con, "mtcars")
# You can fetch all results:
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
dbFetch(res)
dbClearResult(res)
# Or a chunk at a time
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
while(!dbHasCompleted(res)){
chunk <- dbFetch(res, n = 5)
print(nrow(chunk))
}
# Clear the result
dbClearResult(res)
# Disconnect from the database
dbDisconnect(con)
MySQL configuration file
Instead of specifying a username and password in calls to dbConnect(), it’s better to set up a MySQL configuration file that names the databases that you connect to most commonly. This file should live in~/.my.cnf and look like:
[database_name]
option1=value1
option2=value2
If you want to run the examples, you’ll need to set the proper options in the [rs-dbi] group of any MySQL option file, such as /etc/my.cnf or the .my.cnf file in your home directory. For a default single user install of MySQL, the following code should work:
[rs-dbi]
database=test
user=root
password=