Statistical Analysis

Statistical Methods & Machine Learning in “R” 

Bridging the Gap

Before Data Analysis was cool, there existed a wide gap between domain experts of any certain field & computer science.
As our computers grew stronger, we suddenly could outsource several of our data analysis tasks to computers – data collection, data pre-processing, finding patterns & even more interestingly – making predictions. And ironically, this is not new.

The concept of Artificial Intelligence was conceived in the 1950s & today we are capable of building algorithms that can approximately mimic basic cognitive functions of the human brain.

The good news is that the gap between the domain of computer science and any other domain is decreasing rapidly. A huge number of students & researchers are continuously pushing to learn programming languages to make use of the power of computer science and make their job faster, easier & performed with higher accuracy, with each passing day.

We are here to help with that process!!

We tried building a bridge as well using R, especially for students of Bio-Informatik, since they already have a lot on their plate. But that is in no means a discouragement for anyone who wants to scavenge through our materials as they have been produced from a generic point of view & should act as a beginners guide to R regarding

  • Data Pre-Processing
  • Checking for Significant Differences
  • Checking for Patterns

To achieve our goal, we have created a GitHub project :

Statistical Methods & Machine Learning in R

We have archived codes for the above-mentioned aspects with explanatory guides on how to use them & provided theoretical explanations wiki section of our GitHub, for the basic understanding of R & the Statistical Methods we implied using R

GitHub Wiki Contents (Theoretical Concepts) :

“Click on the coloured TABS to access GitHub contents”

We have also created a series of presentations & R Scripts which can act as a complete tutorial for an individual or a group to learn Statistical Methods & Machine Learning with R.

The tutorial consists of presentation files along with an RScript which can be run simultaneously as someone goes through the slides. There is a task as well with each exercise that can provide an assessment for the learner. 

There are Read Me files in each exercise to guide you through the folders

The files inside the downloaded folder are password protected. To obtain the password & to receive the solutions for the tasks in each exercise, please drop a mail to 

heyer@mpi-magdeburg.mpg.de 

 

Tutorial Content :

(Following permalinks will guide you through our RScripts to understand implementation)

  1. Introduction to R & RStudio
  2. Data Types & Packages in R
    1. Data Types
    2. Packages
  3. Reading & Writing Data in R
    1. input
    2. output
  4. Tidying Data: tidyr
  5. Plotting Data: ggplot2
  6. Correlation
  7. Regression
  8. Correlation + Regression
  9. Group Significance Tests
    1.  T-Test
    2. Mann-Whitney-U-Test
    3. Analysis of Variance (ANOVA)
    4. Kruskal-Wallis-Test
    5. Analysis of Similarities (ANOSIM)
    6. Permutational Multivariate Analysis of Variance (PERMANOVA)
  10. Cluster Analysis
    1. K-Means
    2. DBSCAN
    3. Hierarchical (Agglomerative)
  11. Ordination
    1. Principal Component Analysis (PCA)
    2. Principal Coordinate Analysis (PCoA)
    3. Non-metric Multidimensional Scaling (NMDS)
  12. Clustering + Ordination
  13. Statistical Learning Algorithms
    1. Unsupervised
    2. Reinforced
    3. Supervised
  14. Machine Learning Algorithm: Decision Tree
  15. Deep Learning Algorithm: Artificial Neural Network

Much Appreciation towards Julian Lange, Daniel Walke & Max Wolf for their valuable feedback and for making this Tutorial possible

Thanks to Mr Kay Schallert for the opportunity to  display our contents on this website