Towards a task-based search and recommender systems

Tolomei, Gabriele; Orlando, Salvatore; Silvestri, F.

doi:10.1109/ICDEW.2010.5452748

For traditional data warehouses, mostly large and expensive server and storage systems are used. In particular, for small- and medium size companies, it is often too expensive to run or rent such systems. These companies might need analytical services only from time to time, for example at the end of a billing period. A solution to overcome these problems is to use Cloud Computing. In this paper, we report on work-in-progress towards building an OLAP cluster of multi-tenant main memory column databases on the Amazon EC2 cloud computing environment, for which purpose we ported SAP's in-memory column database TREX to run in the Amazon cloud. We discuss early findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon's S3 near-line archiving storage and cached on the local VM disks. © 2010 IEEE.