Data-intensive Computing

CSE4/587 Spring 2019

References & Resources

Other online material/publications

  1. C. ONeil and R. Schutt, Doing Data Sceince, ISBN:978-1-4493-5865-5. Oreilly Media, Doing Data Sceince, http://shop.oreilly.com/product/0636920028529.do
  2. H. Wickham, G. Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition, ISBN: 978-1491910399, O’Reilly media, 2017.
  3. Joel Grus. Data Science from Scratch: First Principles with Python 1st Edition, 978-1491901427, O’Reilly media. 2015.
  4. The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey (editor), Stewart Tansley (editor) and Kristin Tolle(editor), Microsoft Research (October 16, 2009), ISBN-10: 0982544200, ISBN-13: 978-0982544204, online version is available at: http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx
  5. The R Project, http://www.r-project.org/, last viewed Jan 2014.
  6. Project Jupyter. http://jupyter.org/, Last viewed 2017.
  7. Data Science from A-Z, https://www.datascience.com/blog/ by Oracle, 2019.
  8. Predictive analytics: Google Prediction API : https://developers.google.com/prediction/docs/getting-started
  9. Cloud infrastructures: we will focus on amazon for the infrastructure (though Windows Azure and Google App Engine are equally good): http://aws.amazon.com/documentation/
  10. Apache Spark. http://spark.apache.org/
  11. Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, Synthesis Lectures on Human Language Technologies, 2010, Vol. 3, No. 1, Pages 1-177, (doi: 10.2200/S00274ED1V01Y201006HLT007). An online version of this text is also available through UB Libraries since UB subscribes to Morgan and Claypool Publishers. Online version available at: http://lintool.github.com/MapReduceAlgorithms/index.html