Tag Archives: data science

My experience doing R trainings at work

Recently, the office decided to set up a small team to manage its social media presence. Because I had somewhat encouraged the development, I was asked to work with them, at least as a facilitator.

Somewhere down the line, I suggested to some on the team that they should consider carrying out analysis of the social media data, at least beyond the metrics that were already available on most of those sites.

I quickly put together a very rudimentary, but useful, Shiny app, (not without some inspiration from this guy) just to demonstrate a bit of what was possible, and they were eager for me to train them in the use of R. I will share more about the app sometime later.

Application that plots social media data

Screenshot of the Shiny app developed for the team

My aim was (and still is) to get them to a point where they could carry out basic analyses on their own and grow from there. I tried to keep the material as basic and non-intimidating as possible – some of the students admitted to a morbid fear of statistics and I didn’t want to scare them off with anything too tough.

I consider myself a beginner still, so this experience really broadened my own understanding of the language. And I had a lot of fun doing it.

Well, I put together some slides on the training sessions and felt I should share them and hopefully get some feedback. Here they are:

  1. Introduction to R Programming
  2. R Data Structures – starting them off on vectors
  3. R Data Structures (Pt. II) – diving into the basics of data frames
  4. R Data Structures (Pt. III) – examining ways of working with matrices
  5. R Data Structures (Pt. IV) – lists (and lists)

The good thing is that some friends and colleagues (outside the office) have told me that, in the coming year, they would like me to train them as well in the use of R.

It’s only an opportunity for me to, yet, learn the more.

Leave a comment

Filed under Computers & Internet

How I created an R function (for the first time)

calculator-scientific

I know you will probably laugh at me when you read this, especially if you’re a techie, but recently I took my R growth to a new level by creating a proper function. Actually I was experimenting a little with the apply family of functions when it occurred to me that I should attempt to build a function for sapply

First I created a vector of random numbers and created a (probably) meaningless mathematical function

# Create a vector "vec" of random numbers
# and a mathematical function "funny_no"
(vec  <-  round(rnorm(1200, mean = 16, sd = 2)))
funny_no  <-  function(x) sqrt(x)/2 + 3*log10(x) 

 

When I ran it, it worked very well. So I thought to myself “What if I wanted to prevent this function from accepting negative numbers? Well, after a little tinkering, I came up with this:

# Add conditional statements to "funny_no"
funny_no  <-  function(x) {
    if (x>=0) {
      sqrt(x)/2 +3*log10(x)
   }
    else {
      stop("Cannot use negative numbers")
   }
}

And when I ran the following lines of code I saw it was purr-fect!

vec <- round(rnorm(1200, mean = 16, sd = 2))
sapply(vec, funny_no) 

# The next line inserts negative values and function
# throws an error saying "Cannot use negative numbers"
vec[vec == 15]  <-  -15
sapply(vec, funny_no)

So there you have it. Time for some coffee!

Leave a comment

Filed under Computers & Internet

Tales from the R Side

Credit: 'R' by Lorenzo Lorenzi (1772-1850)

Credit: ‘R’ by Lorenzo Lorenzi (1772-1850)

It appears there’s been a little lull on this page and it’s been like that for a good reason. I suddenly found myself caught smack in the middle of doing a course on C++, trying to make progress as I understudy Hadley Wickham via his book, Advanced R, and a Data Science course from Columbia University. I really didn’t have much of a holiday!

Why I thought this was worth sharing is because I am hoping – really hoping – that some of my colleagues and compatriots would consider channelling some of their energies in this direction. I remember vividly how, in the late 90s, I tried to convince fellow doctors to join me in attending FREE computer classes, all to no avail. So, to see the health sector lagging behind in the application of ICTs in my country is not at all surprising. I don’t know, but I have this feeling that this area of knowledge – data science – is going to be very important in the next 5 – 10 years.

First, we’re in the middle of a ‘data boom’. Societies are now literally inundated with data and humans practically littering the data-sphere with their numbers – the proverbial 1s and 0s. This is something we cannot ignore, whether we are job-seekers or entrepreneurs.

Secondly, whether you like it or not, somebody somewhere is taking your data, storing it and using it for something. The prospect of not being able to swim when the world is certain to be flooded, is indeed a grim one.

Thirdly, some of us have this affinity for numbers but don’t really know how to translate this into something practical, something real! Well, welcome to the age of ‘data products’, where you’re either buying or you’re selling. Period.

There are a few more things to say on this but I’m yet to wrap my mind around it. Frankly, the whole thing is dizzying and the speed at which the world is going with this is scary. Many of us should be determined not to be left behind.

So, while I’m schlepping C++ syntax, or trying to figure out the rules that govern R functions (lazy evaluation really stumped me for a bit), I can already see some interesting times ahead of us.

Yes, I think I should go and watch a movie now…

Leave a comment

Filed under Computers & Internet