Syntax highlighter header

Sunday 31 March 2019

Setting up machine for Big Data

For setting up a big data machine you need a 64 bit Linux machine. For that I am using a virtual machine created using VirtualBox by Oracle.  I am using a 9 years old AMD Athlon(tm) II X2 240 Processor, 2800 Mhz, 2 Core(s), 2 Logical Processor(s) with 8GB RAM. I am finding that CPU is a bottleneck hare not the RAM. I allocated 4 GB RAM to the Virtual machine. With this virtual machine you can run sample programs but heavy processing can't be supported. For processing big files with a lots of data an processing I will be using Virtual machines in Google Cloud and Amazon Web Services. It turn out to be cheaper to use cloud rather than upgrading your desktop by spending 50,000 rupee. On cloud you may spend only 5,000 rupee in a year.

You can download VirtualBox from https://www.virtualbox.org/ . You need to enable virtualization in your BIOS for creating 64 bit virtual box. Please refer to following following youtube video for detail process https://www.youtube.com/watch?v=tv0WPJSWBQo . If you are using a different BIOS then search for enabling virtualization in bios on google for more information.

After enabling Virtualization in BIOS and installing Virtual box from https://www.virtualbox.org/ download pre built virtual machine with Hadoop installed from https://www.cloudera.com/downloads/quickstart_vms/5-13.html . In this page you need to provide some details about yourself like email address, name, company etc. After that it will download a zip file with one hard disk image and a configuration file. Expand that zip file in one folder like D:\virtualbox\cloudera-quickstart-vm-5.13.0-0-virtualbox .

Here is video tutorial

Start VirtualBox and click on "Import Appliance".
Select the VM configuration file from your directory where you downloaded Cloudera VM image.

Change MAC Address policy to "Generate new MAC Addresses for all network adapters" and name of virtual machine if you want.

Click on Import button and import will start.

Once the VM is imported you need to change network connection setting for our VM. I need one bridged adapter so that any PC on my local network can connect to my VM and a host local Adapter for local communication from my Windows host machine to the Virtual Machine. Don't change MAC addresses. Accept whatever is provided by default.




Increase the CPU allocated to Virtual machine so that it can consume 100% if required. This will increase performance of the Virtual machine. Also enable "extended VT-x/AMD-V".


Click OK on above dialog box. Now you can start your machine.
VM will be started in another window now. The GUI will open without password prompt. You can use the Virtual machine now. Password for user cloudera is cloudera and for root user password is cloudera.

Please don't enable Cloudera Manager because it will take a lot of resources in terms of memory and CPU. We will run Cloudera Manager on cloud instance.