Statistical Methods & Machine Learning in “R”
Bridging the Gap
Before Data Analysis was cool, there existed a wide gap between domain experts of any certain field & computer science.
As our computers grew stronger, we suddenly could outsource several of our data analysis tasks to computers – data collection, data pre-processing, finding patterns & even more interestingly – making predictions. And ironically, this is not new.
The concept of Artificial Intelligence was conceived in the 1950s & today we are capable of building algorithms that can approximately mimic basic cognitive functions of the human brain.
The good news is that the gap between the domain of computer science and any other domain is decreasing rapidly. A huge number of students & researchers are continuously pushing to learn programming languages to make use of the power of computer science and make their job faster, easier & performed with higher accuracy, with each passing day.
We are here to help with that process!!
We tried building a bridge as well using R, especially for students of Bio-Informatik, since they already have a lot on their plate. But that is in no means a discouragement for anyone who wants to scavenge through our materials as they have been produced from a generic point of view & should act as a beginners guide to R regarding
- Data Pre-Processing
- Checking for Significant Differences
- Checking for Patterns
To achieve our goal, we have created a GitHub project :
Statistical Methods & Machine Learning in R
We have archived codes for the above-mentioned aspects with explanatory guides on how to use them & provided theoretical explanations wiki section of our GitHub, for the basic understanding of R & the Statistical Methods we implied using R
GitHub Wiki Contents (Theoretical Concepts) :
“Click on the coloured TABS to access GitHub contents”
We have also created a series of presentations & R Scripts which can act as a complete tutorial for an individual or a group to learn Statistical Methods & Machine Learning with R.
The tutorial consists of presentation files along with an RScript which can be run simultaneously as someone goes through the slides. There is a task as well with each exercise that can provide an assessment for the learner.
There are Read Me files in each exercise to guide you through the folders
The files inside the downloaded folder are password protected. To obtain the password & to receive the solutions for the tasks in each exercise, please drop a mail to
Tutorial Content :
(Following permalinks will guide you through our RScripts to understand implementation)
- Introduction to R & RStudio
- Data Types & Packages in R
- Reading & Writing Data in R
- Tidying Data: tidyr
- Plotting Data: ggplot2
- Correlation
- Regression
- Correlation + Regression
- Group Significance Tests
- Cluster Analysis
- Ordination
- Clustering + Ordination
- Statistical Learning Algorithms
- Unsupervised
- Reinforced
- Supervised
- Machine Learning Algorithm: Decision Tree
- Deep Learning Algorithm: Artificial Neural Network