Data-intensive Computing

CSE4/587 Spring 2017

References & Resources

Other online material/publications

  1. The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey (editor), Stewart Tansley (editor) and Kristin Tolle(editor), Microsoft Research (October 16, 2009), ISBN-10: 0982544200, ISBN-13: 978-0982544204, online version is available at: http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx
  2. The R Project, http://www.r-project.org/, last viewed Jan 2014.
  3. Project Jupyter. http://jupyter.org/, Last viewed 2017.
  4. Predictive analytics: Google Prediction API : https://developers.google.com/prediction/docs/getting-started
  5. Cloud infrastructures: we will focus on amazon for the infrastructure (though Windows Azure and Google App Engine are equally good): http://aws.amazon.com/documentation/
  6. Apache Spark. http://spark.apache.org/
  7. Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer, Synthesis Lectures on Human Language Technologies, 2010, Vol. 3, No. 1, Pages 1-177, (doi: 10.2200/S00274ED1V01Y201006HLT007). An online version of this text is also available through UB Libraries since UB subscribes to Morgan and Claypool Publishers. Online version available at: http://lintool.github.com/MapReduceAlgorithms/index.html