The Complete Magazine on Open Source

Open Source in enterprise: A change in the climate

and
3.18K 0

Open source software is freely available for use to enterprises. It is a well known fact that major corporations and enterprises like Google, Facebook, Amazon, Bank of America and many others use open source software. This article highlights the open source software available to enterprises for different purposes.

Enterprise class IT is mainly characterised by the following key attributes—reliability, scalability, security, manageability and serviceability. The enterprise hardware and software stack needs to be highly reliable to cater to mission-critical workloads. The workload running on the enterprise IT infrastructure may require more resources from time to time; hence, it needs to be designed to scale seamlessly in all aspects. Since much crucial and confidential data is stored and analysed on IT machines, security becomes an important aspect of the IT infrastructure, which should be built with superior security features.

Periodic updates are needed for the IT infrastructure in terms of software upgrades, bug fixes and hardware replacement. This demands that the enterprise IT infrastructure should have a swift serviceability policy.

Enterprise class IT typically consists of a large number of resources like servers, storage and networking and various software products running on them. Having simple and easy-to-use manageability solutions will help in better resource utilisation and higher productivity.

Open source in the enterprise – the change has begun

The enterprise segment is undergoing changes like never before with the advent of the cloud consumption model. There is a great push from customers for more cost-effective solutions.
Large organisations are looking at viable alternatives that allow more agility, helping them to stay ahead in the market by innovating speedily. It is no surprise that reducing operational IT expenditure, while simultaneously increasing the level of security and software capabilities, is a top priority for most enterprises. Open source products and processes address all these requirements.

  • Greater transparency: Not only is the source code available, but all of the design deliberations are out in the open, in contrast to the secretive processes of proprietary vendors. This transparency makes it easy to assess the product and its community before deciding whether to use it.
  • Greater innovation: Since open source solutions have a large community of experienced developers volunteering to work on them, the pace and level of innovation is fast and continuous. This holds true in another way too —since anyone can contribute code, it allows an open source product to incorporate unusual use cases, which a proprietary vendor may either ignore or dismiss.
  • Cost-effectiveness: Open source solutions are generally much more cost-effective than proprietary solutions, and they also give enterprises the ability to start small and scale up (more on that coming up). Given that enterprises are often budget challenged, it just makes financial sense to explore open source solutions.
  • Speed: Your enterprise will soon be competing on speed, if it isn’t already doing so. Open source enables speed. A great advantage of open source is the ability to take the community versions, get started, understand whether they can solve your business problem, and begin to deliver value right away. Once you make that decision, professional support and services are increasingly available for open source products. This allows you to get the best of both worlds — flexibility, agility and the ability to get started quickly and inexpensively, with the option to mature to a large scale, fully supported, enterprise-grade implementation. And you don’t have to go over proprietary licensing hurdles to get there.

Figure 1: Typical enterprise server architecture

Figure 2: Current enterprise market demands

As per the Gartner report,‘Survey Analysis: Overview of Preferences and Practices in the Adoption and Usage of Open-Source Software, Gartner Reports, 2011’, more than half the organisations surveyed have adopted open source software (OSS) solutions as part of their IT strategy, with nearly one-thirds citing the benefits of flexibility, increased innovation, shorter development times and faster procurement processes. OSS makes up nearly one-third of the surveyed organisations’ overall enterprise software portfolio (applications and infrastructure). Interestingly, this is about the same as the proportion of internally developed software. The presence and the influence of open source software have expanded across software market segments over more than a decade. It’s pervasive in many areas of IT, and continues to emerge and expand across others at a steady pace. Consequently, mainstream IT organisations cannot ignore the influence and presence of OSS in their technology road maps (planned or unplanned). Those that do so, risk facing technical and legal nightmares, and might miss out on significant competitive business value, the Gartner report concluded.

Let us look at the top open source alternatives to some of the current proprietary enterprise components which can be adopted by large enterprises.

Figure 3: Open source adoption in the enterprise

Open source software in the enterprise

Operating system
The most popular choice for an open source operating system is Linux, due to its vast community support and adoption. There are multiple Linux distributions for enterprises, which vary in the packages bundled and the certified applications that accompany it. Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Oracle Linux and Ubuntu Enterprise Linux are a few.

Linux is also the favourite OS due to its support for container deployment features.

Virtualisation software
The virtualisation method can be categorised based on how it mimics hardware to a guest operating system and emulates the guest operating environment. Primarily, there are three types of virtualisation:

  • Emulation
  • Para-virtualisation
  • Container-based virtualisation

Emulation: Emulation, also known as full virtualisation, runs the virtual machine OS kernel entirely in software. The hypervisor used in this type is known as a Type 2 hypervisor. It is installed on top of the host operating system, which is responsible for translating the guest OS kernel code to software instructions. The translation is done entirely in software and requires no hardware involvement. Emulation makes it possible to run any non-modified operating system that supports the environment being emulated. The downside of this type of virtualisation is the additional system resource overhead that leads to a decrease in performance compared to other types of virtualisation.
Oracle VM VirtualBox and QEMU are the top choices for the emulation kind of virtualisation.

Figure 4: Linux technology for supporting containers

Figure 5: Type 2 hypervisor

QEMU is a hosted virtual machine monitor. It emulates CPUs through dynamic binary translation and provides a set of device models, enabling it to run a variety of unmodified guest operating systems. It can also be used with KVM to run virtual machines at near-native speed (requiring hardware virtualisation extensions on x86 machines). QEMU can also be used purely for CPU emulation for user-level processes, allowing applications compiled for one architecture to be run on another.

Oracle VM VirtualBox (formerly Sun VirtualBox) is a free and open source hypervisor for x86 computers currently being developed by Oracle Corporation. VirtualBox may be installed on a number of host operating systems, including Linux, OS X, Windows, Solaris and Open Solaris. There are also ports to FreeBSD and Genode. It also supports the creation and management of guest virtual machines that run versions and derivations of Windows, Linux, BSD, OS/2, Solaris, Haiku, OSx86 and limited virtualisation of OS X guests on Apple hardware.

Para-virtualisation: It is also known as a Type 1 hypervisor, running directly on the hardware or ‘bare-metal’, and provides virtualisation services directly to the virtual machines running on it. It helps the operating system, the virtualised hardware, and the real hardware to collaborate to achieve optimal performance. These hypervisors typically have a rather small footprint and do not, themselves, require extensive resources.

Featured below are the top two open source para-virtualisation products.

KVM (Kernel-based Virtual Machine) is a full virtualisation solution for Linux on x86 hardware containing virtualisation extensions (Intel VT or AMD-V). It consists of a loadable kernel module, kvm.ko, that provides the core virtualisation infrastructure and a processor-specific module like kvm-intel.ko or kvm-amd.ko.

Using KVM, one can run multiple virtual machines running unmodified Linux or Windows images. Each virtual machine has private virtualised hardware—a network card, disk, graphics adapter, etc. The kernel component of KVM is included in mainline Linux, as of 2.6.20. The user space component of KVM is included in mainline QEMU, as of 1.3. Blogs by people active in KVM-related virtualisation development are syndicated at http://planet.virt-tools.org/.

Xen hypervisor is an open source Type 1 or bare metal hypervisor, which makes it possible to run many instances of an operating system or different operating systems in parallel on a single machine (or host). The Xen Project hypervisor is the only Type 1 hypervisor that is available as open source. It is used as the basis for a number of different commercial and open source applications, such as server virtualisation, Infrastructure as a Service (IaaS), desktop virtualisation, security applications, and embedded and hardware appliances. The Xen Project hypervisor is powering the largest clouds in production today.

Figure 6: Para-virtualisation

Container based virtualisation: Container based virtualisation, also known as operating system-level virtualisation, enables multiple isolated executions within a single operating system kernel. It has the best possible performance and density, while featuring dynamic resource management. The isolated virtual execution environment provided by this type of virtualisation is called a container and can be viewed as a traced group of processes.
Containers have also sparked an interest in micro service architecture, a design pattern for developing applications in which complex applications are broken down into smaller pieces that work together. Each component is developed separately, and the application is then simply the sum of its constituent components. Each piece, or service, can reside inside a container, and can be scaled independently of the rest of the application as the need arises.

Undoubtedly, one of the biggest reasons for the recent interest in container technology has been the Docker open source project, a command line tool that makes creating and working with containers easy for developers and sys admins alike, similar to the way Vagrant makes it easier for developers to explore virtual machines easily.

Docker is a command line tool for programmatically defining the contents of a Linux container in code, which can then be versioned, reproduced, shared and modified easily just as if it were the source code to a program.

Cloud platforms
Cloud Foundry is the world’s leading open source Platform-as-a-Service (PaaS) for continuous innovation
The Cloud Foundry platform is available from either the Cloud Foundry Foundation as open source software, or from multiple commercial providers as a product or a service.
Cloud Foundry is open source software and, hence, available to anyone. Deploying it involves interfacing with the underlying infrastructure using the Cloud Foundry BOSH (bosh outer shell) deployment scripting language, another open source tool from Pivotal. The Baidu website is implemented on Cloud Foundry.

Orchestration tools
Based on the cloud platforms adopted, given below are some of the top orchestrators.
Cloud Foundry’s Diego: Diego is a container management system that combines a scheduler, runner and health manager. It is a rewrite of the Cloud Foundry runtime.

Docker Swarm: Docker Swarm provides native clustering functionality for Docker containers, which lets you turn a group of Docker engines into a single, virtual Docker engine.
Kubernetes: Kubernetes is an orchestration system for Docker containers. It handles scheduling and manages workloads based on user-defined parameters.

Mesosphere Marathon: Marathon is a container orchestration framework for Apache Mesos that is designed to launch long-running applications. It offers key features for running applications in a clustered environment.

Figure 7: Container based virtualisation

Big Data applications
Apache Cassandra is a free and open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data centres, with asynchronous masterless replication allowing low latency operations for all clients.

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project, sponsored by the Apache Software Foundation

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley’s AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. It provides programmers with an application programming interface centred on a data structure called the resilient distributed dataset (RDD), a read-only multi-set of data items distributed over a cluster of machines, which is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear data flow structure on distributed programs. MapReduce programs read input data from the disk, map a function across the data, reduce the results of the map, and store reduction results on the disk. Spark’s RDDs function as a working set for distributed programs that offer a (deliberately) restricted form of distributed shared memory.

Databases
PostgreSQL is essentially a relational database, but with an object-oriented database model. Performance is nearly as fast as MariaDB and MySQL, but the biggest plus is PostgreSQL’s management interface called pgAdmin, which has a robust set of features that’s easy and intuitive to navigate.

MariaDB is a community-developed fork of the MySQL relational database management system intended to remain free under the GNU GPL. It is notable for being led by the original developers of MySQL, who forked it due to concerns over its acquisition by Oracle. Contributors are required to share their copyright with the MariaDB Foundation. It intends to maintain high compatibility with MySQL, ensuring a ‘drop-in’ replacement capability with library binary equivalence and exactly matches with MySQL APIs and commands. It includes the XtraDB storage engine for replacing InnoDB, as well as a new storage engine, Aria, that is intended to be both a transactional and non-transactional engine, perhaps even to be included in future versions of MySQL.

MongoDB is the NoSQL database movement that came about to address the shortcomings of relational databases and the demands of modern software development. MongoDB is the leading NoSQL database, with significant adoption among the Fortune 500 and Global 500 companies.

Application runtime environments
OpenJDK, PHP runtime and Apache Tomcat are the most widely used open source application runtime environments based on the kind of workloads enterprise class IT runs.

OpenJDK (Open Java Development Kit) is a free and open source implementation of the Java platform, Standard Edition (Java SE). The implementation is licensed under the GNU General Public License (GNU GPL) version 2 with a linking exception. OpenJDK is the official reference implementation of Java SE since version 7.

The OpenJDK project produces a number of components, most importantly, the virtual machine (HotSpot), the Java Class Library and the Java compiler (javac).

The Web browser plugin and Web Start, which form part of Oracle Java, are not included in OpenJDK. Sun previously indicated that it would try to open source these components, but neither Sun nor Oracle have done so. The only currently available free plugin and Web Start implementations, as of 2016, are those provided by IcedTea.

Apache Tomcat, often referred to as Tomcat Server, is an open source Java Servlet Container developed by the Apache Software Foundation (ASF). Tomcat implements several Java EE specifications including Java Servlet, JavaServer Pages (JSP), Java EL and WebSocket, providing a ‘pure Java’ HTTP Web server environment in which Java code can run.

Tomcat is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation, released under the Apache License 2.0, and is open source software.

PHP is a server-side scripting language designed primarily for Web development but is also used as a general-purpose programming language. PHP originally stood for Personal Home Page, but it now stands for the recursive acronym PHP: Hypertext Preprocessor.

PHP code may be embedded into HTML code, or it can be used in combination with various Web template systems, Web content management systems and Web frameworks. PHP code is usually processed by a PHP interpreter implemented as a module in the Web server or as a Common Gateway Interface (CGI) executable. The Web server combines the results of the interpreted and executed PHP code, which may be any type of data, including images, with the generated Web page. PHP code may also be executed with a command-line interface (CLI) and can be used to implement standalone graphical applications.

The standard PHP interpreter, powered by the Zend Engine, is free software released under the PHP licence. PHP has been widely ported, and can be deployed on most Web servers on almost every operating system and platform, free of charge.

Manageability software
OpenDCIM is a free, Web based data centre infrastructure management (DCIM) application. While proprietary products dominate the DCIM solutions, OpenDCIM is the most promising open source alternative.
While OpenDCIM doesn’t provide all the features available in commercial products, it covers a majority of them. OpenDCIM is not designed to compete with commercial products. Its biggest advantage is that it is effectively free for use. For data centres of almost any size, the tool is an effective introduction to DCIM concepts, and allows administrators to evaluate the utility of DCIM software in their own environments.

Open source hardware in the enterprise

Open compute project (OCP)
The Open Compute Project (OCP) is an organisation that shares designs of data centre products among companies, including Facebook, Intel, Google, Apple, Microsoft, Seagate Technology, Dell, Rackspace, Ericsson, Cisco, Juniper Networks, Goldman Sachs, Fidelity, Lenovo and Bank of America. It is a collaborative community focused on redesigning hardware technology to efficiently support the growing demands on compute infrastructure.

In 2011, Facebook shared its designs with the public and—along with Intel, Rackspace, Goldman Sachs and Andy Bechtolsheim—launched the Open Compute Project, incorporating the Open Compute Project Foundation. The five members hoped to create a movement in the hardware space that would bring about the same kind of creativity and collaboration we see in open source software. And that’s exactly what’s happening.

The Open Compute community has multiple active open hardware streams. There are multiple designs available for storage, networking, servers, racks and general guidelines for designing a data centre.

RISC-V is an open Instruction Set Architecture (ISA) and is delivered as a parameterised core generator. It has a small general-purpose base and multiple optional extensions, including reserved opcodes for unique SoC instructions.

Its goals are:

  • To become the industry-standard ISA for all computing devices.
  • To expand specifications for SoCs, including I/O and accelerators.
  • As most of the cost in chip design is the cost of the software, this architecture will try to ensure that this software can be reused across many chip designs.
  • Encourage both open source and proprietary implementations of the RISC-V ISA specifications.

RISC-V has been adopted by IIT Madras (and others too), and RISC-V chips are shipping commercially.
To summarise, enterprise class IT has evolved by adopting open source products and processes, resulting in more agility and helping organisations to stay ahead in the market with speedy innovations. It is no surprise that reducing operational IT expenditures, while simultaneously increasing the level of security and software capabilities, is a top priority for most enterprises. Open source addresses these concerns. Since enterprise open source comes with its own advantages and risks, one has to choose the right set of open source solutions to suit specific needs. Enterprises need to consider the advantages of open source and how it can help them improve their operations.