Tag Archives: R

R Packages: Solving a problem using devtools in Windows

In the introduction to his book R packages, Hadley Wickham provides a neat function for making sure that everything is set for writing your own R extensionsby simply running the devtools::has_devel(), which, if all goes well, should evaluate to TRUE.

This did not work out for me and I had to fix this problem on 2 different occasions so I felt I need to share this info in case there are others also stumped by this hurdle.

The fix I found – after a full sweaty day – was in this conversation on GitHub and I would like to break it down very quickly:

  1. Make sure you have installed Rtools from CRAN
  2. Make sure that Rtools/bin as well as Rtools/MinGW/x64/ are added to your system PATH (if you don’t know how, click here)
  3. In addition, it is recommended that you install LATEX (the link is also found on the Rtools page mentioned on No. 1)
  4. Run the following lines of code

install.packages("devtools")

library(devtools)

install_github("hadley/devtools")    # to get the latest 'pre-CRAN' package updates

find_rtools()

has_devel()    # output should be TRUE

Like I said, I had this problem on 2 different machines (Windows 7 and 10) and the same fix worked on both of them.

Cheers!

Advertisements

Leave a comment

Filed under Computers & Internet

How I created an R function (for the first time)

calculator-scientific

I know you will probably laugh at me when you read this, especially if you’re a techie, but recently I took my R growth to a new level by creating a proper function. Actually I was experimenting a little with the apply family of functions when it occurred to me that I should attempt to build a function for sapply

First I created a vector of random numbers and created a (probably) meaningless mathematical function

# Create a vector "vec" of random numbers
# and a mathematical function "funny_no"
(vec  <-  round(rnorm(1200, mean = 16, sd = 2)))
funny_no  <-  function(x) sqrt(x)/2 + 3*log10(x) 

 

When I ran it, it worked very well. So I thought to myself “What if I wanted to prevent this function from accepting negative numbers? Well, after a little tinkering, I came up with this:

# Add conditional statements to "funny_no"
funny_no  <-  function(x) {
    if (x>=0) {
      sqrt(x)/2 +3*log10(x)
   }
    else {
      stop("Cannot use negative numbers")
   }
}

And when I ran the following lines of code I saw it was purr-fect!

vec <- round(rnorm(1200, mean = 16, sd = 2))
sapply(vec, funny_no) 

# The next line inserts negative values and function
# throws an error saying "Cannot use negative numbers"
vec[vec == 15]  <-  -15
sapply(vec, funny_no)

So there you have it. Time for some coffee!

Leave a comment

Filed under Computers & Internet

Tales from the R Side

Credit: 'R' by Lorenzo Lorenzi (1772-1850)

Credit: ‘R’ by Lorenzo Lorenzi (1772-1850)

It appears there’s been a little lull on this page and it’s been like that for a good reason. I suddenly found myself caught smack in the middle of doing a course on C++, trying to make progress as I understudy Hadley Wickham via his book, Advanced R, and a Data Science course from Columbia University. I really didn’t have much of a holiday!

Why I thought this was worth sharing is because I am hoping – really hoping – that some of my colleagues and compatriots would consider channelling some of their energies in this direction. I remember vividly how, in the late 90s, I tried to convince fellow doctors to join me in attending FREE computer classes, all to no avail. So, to see the health sector lagging behind in the application of ICTs in my country is not at all surprising. I don’t know, but I have this feeling that this area of knowledge – data science – is going to be very important in the next 5 – 10 years.

First, we’re in the middle of a ‘data boom’. Societies are now literally inundated with data and humans practically littering the data-sphere with their numbers – the proverbial 1s and 0s. This is something we cannot ignore, whether we are job-seekers or entrepreneurs.

Secondly, whether you like it or not, somebody somewhere is taking your data, storing it and using it for something. The prospect of not being able to swim when the world is certain to be flooded, is indeed a grim one.

Thirdly, some of us have this affinity for numbers but don’t really know how to translate this into something practical, something real! Well, welcome to the age of ‘data products’, where you’re either buying or you’re selling. Period.

There are a few more things to say on this but I’m yet to wrap my mind around it. Frankly, the whole thing is dizzying and the speed at which the world is going with this is scary. Many of us should be determined not to be left behind.

So, while I’m schlepping C++ syntax, or trying to figure out the rules that govern R functions (lazy evaluation really stumped me for a bit), I can already see some interesting times ahead of us.

Yes, I think I should go and watch a movie now…

Leave a comment

Filed under Computers & Internet

A Simple Modification of Missingness Maps

 

Source: itc2.utk.edu

Source: itc2.utk.edu

I am one of those who is becoming increasingly convinced that data cleaning should be done in such a way that it is open to scrutiny. This is one of the disadvantages of using point-and-click software for data management. As an aspiring R Jedi, I was working on a relatively large dataset (37,000+ records) that has a lot of missing values. When I plotted it in R using the following code


#Load and explore data
pop <- read.csv("consolidated data.csv", na.strings = "")
dim(pop)
str(pop)
names(pop)

#Display and quantify missing values
apply(pop, 2, function(x) sum(is.na(x)))

# Plot missing values
library(Amelia) 
missmap(pop)

The ensuing plot turned out like this:

missingmap1

As one can see, this pretty plot could do with some customization so that we can tell the audience what we are mapping as well as get rid of some superfluous detail. On looking at the R documentation on the missmap function, I discovered the following default parameters (click here for full documentation):

missmap(obj, legend = TRUE, col = c(“wheat”,”darkred”), main,
y.cex = 0.8, x.cex = 0.8, y.labels, y.at, csvar = NULL, tsvar =
NULL, rank.order = TRUE, …)

I am okay with the legend and colour scheme, but would like to give it a better title. Also, because there are so many records, the y-axis labels are completely illegible. I would love to get rid of that. As I tinkered with this, I discovered that the customization is slightly different from what I would have done in the in-built graphics package. For instance, the argument for y-axis labels for the latter is ylab, as against y.labels in Amelia. Also, adjustments in y.labels had to be concomitantly reflected in the y.at argument.

After a little trial and not a few errors, I eventually came up with this:


missmap(pop,
 main = "Missingness Map of CCT Dataset",
 y.labels = NULL,
 y.at = NULL)

And this is what the plot now looks like this:

missingmap2

This looked a bit better to me (though not perfect). I’ve given it my own title and removed the jumbled up axis labels on the y-axis.

The moral to the story is that with examination of the R documentation and with a little luxury of time, I can tweak my output to suit my needs, tastes and quirks as well as provided greater appeal for the target audience. Also, R helps us to make our data cleaning more reproducible and therefore more transparent and credible.

Leave a comment

Filed under Computers & Internet

Facial recognition for data visualization

Yesterday, I discovered a very interesting capability in R. There’s a package called aplpack that allows you to plot Chernoff faces. Chernoff faces is a way of presenting multivariate data in which the output looks like the faces of cartoon characters. One thing that I immediately noticed was that in using this package, I could quickly recognize (at a glance) aspects of the dataset that are similar or dissimilar.

Following the example of the article I read, I tested it using the built-in dataset in R known as mtcars (Motor Trend Car Road Tests), which looks like this:

                                  mpg cyl  disp  hp drat    wt  qsec vs am gear carb

Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4

Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4

Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1

Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1

Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1

Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4

Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2

Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2

Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3

Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3

Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3

Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4

Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4

Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2

Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1

Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2

AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2

Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4

Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2

Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1

Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2

Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4

Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

When you run the appropriate function in aplpack, the dataset now looks like this:

faces

With the visualization of the data , one can quickly see which cars in the dataset are similar in the variables tested.

Neat eh?

For more information on this, take a look at the original blog post.

1 Comment

Filed under Computers & Internet