This is part one of the two part post related to Docker, PostgreSQL databases and Anomaly data-sets.


In recent LinkedIn posts ( original and Rami’s repost) and tweet, I asked the internet for their favorite datasets for anomaly detection problems, particularly in the time-series domain. I got lots of responses, and now have a massive amount of data to play with, thank you folks who responded.


This is part two of the two part post related to Docker, Postgres databases and Anomaly data-sets. Read Part 1, which teaches you how to setup a new postgres database using Docker.

This post describes how you populate the anomaly database built in Part 1.


Continuing the theme of end-to-end reproducible workflows, I want to be able to recreate my raw database programmatically as well.

At the end of this activity, I’m able to quickly load and manage ~6G of data for…

Why & How to use Docker in your daily R workflow

Source of Pain

  • One of the biggest challenges we face as data-scientists while working on production-grade codebases is ensuring end-to-end reproducibility and stability over time.
  • This would extend to the academia as well, but I’m not an academician.
  • A basic R setup has no built-in functionality to store the package versions & dependencies…

I often use data ETL pipeline scripts cron-jobed on text-only interfaces (be it Jenkins, or cron-jobed in shell). While I print descriptive stats to keep tabs on ETL runs, I’ve found {txtplot} adds a higher level of fidelity in my logfiles.

Rahul Sangole

Data Scientist. Time Series. Anomaly Detection. R Production-ization. Shiny Visualization.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store