Data-intensive Computing

CSE4/587 Spring 2019

References & Resources

Other online material/publications

  1. C. ONeil and R. Schutt, Doing Data Sceince, ISBN:978-1-4493-5865-5. Oreilly Media, Doing Data Sceince,
  2. H. Wickham, G. Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition, ISBN: 978-1491910399, O’Reilly media, 2017.
  3. Joel Grus. Data Science from Scratch: First Principles with Python 1st Edition, 978-1491901427, O’Reilly media. 2015.
  4. The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey (editor), Stewart Tansley (editor) and Kristin Tolle(editor), Microsoft Research (October 16, 2009), ISBN-10: 0982544200, ISBN-13: 978-0982544204, online version is available at:
  5. The R Project,, last viewed Jan 2014.
  6. Project Jupyter., Last viewed 2017.
  7. Data Science from A-Z, by Oracle, 2019.
  8. Predictive analytics: Google Prediction API :
  9. Cloud infrastructures: we will focus on amazon for the infrastructure (though Windows Azure and Google App Engine are equally good):
  10. Apache Spark.
  11. Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, Synthesis Lectures on Human Language Technologies, 2010, Vol. 3, No. 1, Pages 1-177, (doi: 10.2200/S00274ED1V01Y201006HLT007). An online version of this text is also available through UB Libraries since UB subscribes to Morgan and Claypool Publishers. Online version available at: