Quantcast
Channel: data analytics - EnterpriseAI
Viewing all 70 articles
Browse latest View live

IBM to Invest $3B to Bring IoT to the Enterprise

$
0
0

The money keeps pouring into the thing called the Internet of Things.

IBM joined a growing list of technology giants this week in announcing plans to invest $3 billion over the next four years to "connect the Internet of Things to the enterprise." The effort announced Tuesday (March 31) includes plans to build a cloud platform that would help partners develop IoT applications.

The company also announced an expansion of its IoT ecosystem to include chip and device makers, including ARM Ltd.

With a clear emphasis on helping enterprises boost their data analytics capabilities, IBM offered this definition of the still amorphous IoT: "an integrated fabric of devices, data, connections, processor and people."

While other IoT entrants like Cisco Systems and Intel are focusing on IoT plumbing, including open interconnects to devices and sensors as well as networking software, IBM appears to be doubling down on its earlier bets on platforms like Smarter Planet and, more recently, its Bluemix platform-as-a-service.

The analytics emphasis that parallels efforts by other IoT players emphasizes an IoT cloud platform designed to help enterprises deal with the waterfall of unstructured data being generated by connected devices and sensors. Hence, IBM's initiative seeks to throw a life preserver to companies quickly drowning in data.

IBM said partners and customers could use its IoT cloud platform to develop "vertical industry IoT solutions." One example was analyzing data from connected cars to help insurers develop "dynamic pricing models" and offering customized services to individual drivers.

Bluemix, IBM's cloud application development platform, is also getting a facelift as part of the IoT initiative. The company said it would add new IoT services as part of Bluemix to funnel real-time data into cloud-based development of IoT apps. By embedding analytics into IoT apps, IBM said it could help upgrade existing applications like asset and facilities management.

Along with the cloud-based IoT platform initiatives, IBM also said it is expanding its ecosystem of partners to include silicon and device manufacturers. Among these are AT&T, the analog and mixed signal chipmaker Semtech and, significantly, microprocessor intellectual property vendor ARM.

U.K.-based ARM, which has made limited headway into the server market dominated by Intel's x-86 processors, has already rolled out an "IoT starter kit" that includes and Ethernet version of its microcontroller designed to connect with IBM's Internet of Things infrastructure. ARM's "mbed" microcontroller functions as a drive on a computer and is designed to help users visualize device data in real time.

Meanwhile, Semtech and IBM announced a technology partnership in early March focused on low-power, wide-area networks to support machine-to-machine communications. Those networks would form the networking backbone of the IoT. The partners said the networking technology is capable of connecting sensors over long distances using minimal power and infrastructure.

As part of its IoT push, IBM also announced an alliance with The Weather Company that will allow the forecaster to collect data from weather sensors and mobile devices. The partners said they would use the platform to help customers use weather prediction in their businesses. IBM and other companies have been trying to build businesses around using long-term weather forecasts to help companies reduce business risks and make informed bets in areas like the pricing of commodities.

IBM's investment is further evidence that the Internet of Things is moving from a buzz phrase to a revenue-generating business model that seeks to leverage cloud computing and big data analytics. Along with IoT partnerships, major players have also been moving to secure backbone technologies needed to flesh out their IoT strategies. For example, Intel is reportedly in talks to acquire programmable logic specialist Altera Corp., a move analysts said would help boost the chip maker's IoT efforts.


Big Data ‘Center of Gravity’ Shifting to Cloud

$
0
0

As practitioners of big data analytics seek to bring computing power closer to the data rather than the other way around, the cloud is emerging as the preferred platform for increasingly sophisticated data crunching by more industries adopting big data strategies.

For now, market watcher Wikibon concluded in a big data vendor and market forecast released this week that cloud services represent only a fraction of a big data market expected to top $35 billion this year. Last year, for example, the market survey found that cloud services accounted for about $1.3 billion of a booming $27.3 billion big data market. By contrast, the "professional services" sector delivering analytic tools represented more than $10.4 billion in big data revenues.

But those trends are said to be reversing as more enterprises move to the cloud and the adoption of big data tools moves up the corporate ladder from IT operators and data scientists to the corner office.

Wikibon reported that big data-related cloud services, including infrastructure-, platform- and software-as-a-service offerings, are "still in the early stages of development" as part of big data deployments, accounting for only 5 percent of the overall big data market.

"The vast majority of Big Data production workloads today are hosted on-premises, as by and large that is where the data being processed and analyzed 'lives'," the market research added.

For now, most big data production workloads are hosted on-premises, but Wikibon foresees the big data "center of gravity" shifting to the cloud as more enterprises deploy cloud infrastructure or embrace hybrid cloud options as a way to leverage cloud technology while securing sensitive or proprietary in-house.

"Big data is heavy, and a central tenet of big data is to bring the compute to the data rather than the data to the compute," the survey stressed. "The cloud offers the added benefit of abstracting away significant layers of complexity associated with internally-hosted big data deployments."

Hence, Wikibon sees big data deployments fueling the adoption of cloud infrastructure that will be increasingly easier to deploy. As the cloud becomes the preferred host for big data deployments, Wikibon forecasts that cloud vendors will take a big chunk of revenue away from professional services providers like IBM, Accenture, Deloitte and Capgemini who currently dominate the provisioning of big data analytics.

Indeed, it is increasingly common to hear cloud vendors laying plans to integrate data analytics into their offerings. That approach is widely seen as the best way to move computing power closer to huge volumes of structured and, with the rise of mobile and other connected devices, unstructured data—much of it with a very short shelf life.

Along with IBM, early proponents of cloud-based big data deployments include vendors like Cloudera, the enterprise data hub specialist that is also actively courting the emerging federal cloud market.

Other likely beneficiaries of big data's transition to the cloud are major public cloud vendors like Amazon Web Services, Google Cloud Platform and Microsoft Azure. According to the Wikibon survey, Microsoft racked up an estimated $532 million in big data revenues in 2014, placing the cloud vendor among the top 10 big data vendors. Amazon's 2014 big data revenues totaled about $440 million while Google reached an estimated $225 million last year, the market analyst said.

Overall, Wikibon forecasts the "cloud services" related to big data deployments could reach $5.69 billion by 2020.

 

In-Memory Platform Targets Real-Time Data

$
0
0

Querying real-time data on a massive scale remains a huge infrastructure hurdle that is increasingly being addressed by in-memory data grids. Hazelcast Inc., an in-memory computing vendor and partner C24 Technologies have rolled out an in-memory approach that promises to reduce enterprise data storage by as much as half.

The partners said Wednesday (Sept. 9) their "Hypercast" in-memory computing platform responds to the inability of relational database management systems to keep pace with massive data volumes. Its low-latency in-memory approach is said to provide faster processing and querying of real-time data.

The partners maintain that one of the keys to achieving real-time, low-latency performance is the ability to ingest, store and query data in-memory. The catch is that traditional in-memory computing approaches sacrifice real-time queries for data compression and faster transmission speeds.

The Hypercast partnership combines Hazelcast's open-source in-memory computing platform with London-based C24's Preon Java-based data binding technology. The partners are promising that the combination of in-memory data grids with data compression techniques would yield "sub-millisecond" data access speeds.

The partners claimed that internal benchmark testing of Preon data running on the Hypercast platform reached "microsecond speeds."

While Hazelcast claims its binary storage approach helps reduce memory and storage requirements for complex data by "many orders of magnitude," Preon is leveraged in the new platform to ingest and compact high volumes of complex data messages into byte arrays using C24's "virtual getter" interface. In testing, C24 said 7 KB messages were compacted to 200 bytes.

The partners added that compression capability has been folded into the Hypercast platform that runs on Hazelcast's elastic cluster of memory servers. They can be be sized to scale depending on the application.

Among Hazelcast's financial services customers are Capital One, the Chicago Board Options Exchange and Deutsche Bank. In-memory vendors and financial services software providers are looking for new ways to reduce the cost of huge data volumes by reducing memory footprint without sacrificing fast database queries.

The partners said an unnamed financial services software vendor benchmarked Hypercast against other platforms. The Preon data format was found to store "nearly twice as much data in the same memory footprint" as competing serialization approaches, the partners claimed.

The benchmark results are available here.

The result, the partners assert, is much faster processing of tens of millions of messages per second along with an equal volume of queries.

The demand for real-time data access at scale is growing, especially in the financial services sector. That partners cited industry estimates that the in-memory data grid market is forecast to grow at a 32 percent compound annual rate over the next five years.

They Hypercast platform appears to meet at least some of the key requirements for migrating to in-memory platforms. Among these, experts note, is the ability to optimized in-memory computing for specific applications. As data volumes soar, in-memory systems also need to scale up to the industry standard of about 6 TB. Experts predict that larger systems operated by financial services firms running systems like SAP HANA may require databases as large as 30 TB.

Hazelcast said the new platform incorporates its high-density memory store designed to move "hundreds of terabytes."

IBM Bluemix Adds Apache Kafka Messaging

$
0
0

Cloud vendors continue to add a range of open-source development tools to their platforms. The latest example comes from IBM, which has steadily been building up the toolbox on its Bluemix development platform.

The company said Friday (Oct. 30) it is adding two new services based on Apache Kafka, the real-time distributed messaging system, to its Bluemix cloud development platform. The first, a streaming analytics service, is intended to give developers a better way to visualize data. The second, dubbed IBM Message Hub, aims to provide distributed messaging for cloud applications.

IBM (NYSE: IBM) said the new tools based on the Apache Kafka messaging engine would make it easier for developers to integrate internal data into new applications as well as to visualize and analyze ever-larger datasets.

The addition of Kafka to its Bluemix development platform expands IBM's embrace of open-source tools and platforms. In August, for example, it released a pair of Linux mainframe servers along with parallel Linux initiatives aimed at developing new software distributions for the Linux servers.

The streaming analytics service would help developers expand use of data analytics. In one example, IBM said it used the service in collaboration with the University of Ontario Institute of Technology and Toronto's Hospital for Sick Children. The service was used in neonatal intensive care units to process more than 1,000 readings a second from monitors attached to prematurely born babies. That data was used to spot high-risk conditions much earlier and speed treatment.

Meanwhile, the Message Hub service allows for use of REST or the Apache Kafka API to communicate with other applications.

IBM's adoption of Apache Kafka is the latest boost for the Kafka messaging engine originally developed at LinkedIn NYSE: LNKD) and adopted by other hyper-scalers like Netflix (NASDAQ: NFLX) and Twitter (NYSE: TWTR). Confluent, a startup founded by the creators of Apache Kafka, announced a $24 million funding round, bringing its total funding raised to $31 million in two rounds.

Confluent, Mountain View, Calif., said it would use the new funding to add new streaming data management features to Kafka along with new elements to the Confluent Platform. The startup said the upgrades would address the enterprise push to leverage real-time streaming data.

Kafka is a real-time, scalable messaging system designed to help integrate data from a variety of applications and systems on a single stream data platform. Along with integrating data, the platform would enable stream processing via a central data hub. IBM's Message Hub service is based on that Apache Kafka capability.

Confluence touts its own platform as making streaming data available as low-latency data streams required for real-time stream processing.

IBM launched its Bluemix cloud platform-as-a-service in 2014 that now includes more than 120 tools and services for developing data analytics and other applications.

Using Automated Data Analytics and IoT to Enhance Shop Floor Productivity

$
0
0

Data from networked devices — the Internet of Things (IoT) — is beginning to flow from the manufacturing floor to the data center. This IoT data provides an easier, better, and more automated way to monitor the performance of the entire manufacturing process. Many companies are deploying IoT-enabled devices, but using the actual data can be a challenge because most large organizations don’t have a data scientist on staff to analyze the statistics — small to mid-sized factories and manufacturing shops certainly don’t have that luxury. To address this need and create a cost-saving IoT infrastructure, many smaller companies are now turning to automated IoT solutions.

Traditionally, “shop floor“ data is analyzed using Statistical Process Control (SPC), which applies statistical methods to gain insight into manufacturing processes. SPC is deployed to both monitor and control manufacturing processes, in order to ensure optimal operations. It does this by providing threshholding, where alarms and alerts can be set on IoT (and other) devices.  A deeper analysis can be made, however, by using the constantly flowing IoT data to detect work stoppage anomalies, events that could possibly go undetected using SPC and thresholding.

For example, a rapid temperature fluctuation may occur beneath a set threshold, indicating that a failure is imminent. Such a failure mode would go undetected without a deeper analysis of the available data. Modern data analytics systems now make it possible to detect anomalies in data patterns based on normal past behavior, rather than on arbitrary thresholds set by engineers.

Numascale’s data science team has developed a fully automated end-to-end solution that uniquely addresses this cost saving capability by preventing expensive downtime. The process works using six simple steps:

  1. Edge Analytics Appliances are deployed to read and collect data (these Edge analytics appliances also do the real-time analytics when models are trained).
  2. A model is automatically trained with a real incoming stream of IoT data, based on a time window set up by the engineer (who would have an idea of what is considered normal behavior).
  3. The model is built on back-end Numascale servers or in the cloud.
  4. The model is pushed back down to the Edge Analytics Appliance.
  5. Once in operation, the model offers much more intelligent monitoring than the previous threshold approach.
  6. Model performance is tracked and the model is updated when required.

Live Metrics with Anomalies

Numascales's approach solves a number of deployment issues that have been common in the past. For instance, a typical six-month analytics deployment timeframe becomes just a few weeks because a robust suite of analytical models for machine anomalies is already built. Secondly, no data scientist is required at the customer/user site, only an experienced floor engineer who can help determine the time window of good data (the length of time that provides an adequate amount of normal operational data) and similar parameters.

Lastly, customers can choose to deploy a Proof Of Concept in the cloud and thus bypass long procurement processes and approvals. Since the costs will be pay-per-use, it is easy enough to show ROI, and then go for a larger on-premise deployment based on ROI data.

Numascale provides scalable turnkey systems for Big Data Analytics by combining optimized and maintained open source software with cache coherent shared memory architecture. Their IoT solution was developed in conjunction with their partners and customers, such as Intel, Adlink, Dell, and Singapore Technologies.

numascale.com

Spreading Spark Enterprise-wide

$
0
0

Spark is in in the spotlight. Companies with big data analytics needs are increasingly looking at the open source framework for lightning quick in-memory performance – reputedly up to 100X faster than Hadoop MapReduce (according to http://spark.apache.org/). As the data tsunami rolls on and quintillion bytes of data are generated every day, Spark is one of the answers to the daunting task of pulling insight and value out of oceanic data sets.

But it’s also often the case that business analysts and data scientists in the enterprise are so eager to get their hands on Spark that they stray off the IT reservation and set up ad hoc Spark clusters, causing resource strains, siloed data, security risks and other management challenges.

The launch of IBM’s Platform Conductor for Spark is intended to keep Spark under the big IT tent, enabling production-ready, IT-approved and manage multiple Spark instances across the enterprise. IBM calls it a hyperconverged, multi-tenant offering that uses Spectrum Scale (formerly GPFS) File Place Optimizer to add the Spark environment to massive data sets.

Nick Werstiuk of IBM

Nick Werstiuk of IBM

“We’re delivering the ability to have a common file system across the nodes in a Spark cluster that provides both GPFS and Posix access to the data,” Nick Werstiuk, product line executive, software defined infrastructure, IBM Systems, told EnterpriseTech. “So it gives clients the ability to move the data in and out of the Spark environment according to data life cycle management needs.”

“Users facing the challenge of running Spark in a production environment need an end-to-end, enterprise-grade management solution,” said Carl Olofson, research vice president, application development and deployment at industry watcher International Data Corp. “IBM has made a major commitment to supporting organizations’ Spark needs, and is offering IBM Platform Conductor for Spark as such a solution.”

IBM Platform Conductor for Spark is the third offering in IBM’s software-defined Platform Conductor shared infrastructure portfolio, the others being Platform LSF, for HPC design, simulation and modeling workloads; and Platform Symphony, for high-performance risk analytics. Platform Conductor sits atop the IBM Platform Computing common resource management layer that can be implemented in a distributed environment on a variety of on-prem (OpenPOWER, x86) hardware platforms or hybrid cloud infrastructure.

IBM said the product aims to achieve faster time-to-results and simplified deployments via lifecycle management capabilities, including resource scheduling, data management, monitoring, alerting, reporting and diagnostics, that allow multiple instances of Spark jobs to run on resources that would otherwise be idle.

“We see Spark as one of the critical new sets of workloads evolving out of the Big Data ecosystem, the next foundational middleware for Big Data analytics,” Westiuk said. “Our vision is for these workloads coming together on a shared infrastructure. Here’s a set of software capabilities you can deploy on your own hardware – scale-out x86 or PowerLinux environments – and essentially start up an all-inclusive Spark infrastructure for a multi-tenant, shared service Spark capability to multiple users or data scientists or lines of business within an enterprise.”

Data Analytics Drives New Fan-Team Relationship for NBA’s Orlando Magic

$
0
0

Short of shooting three-pointers and grabbing rebounds, advanced data analytics has taken on a pivotal role for the Orlando Magic of the National Basketball Association, driving nearly every aspect of the organization’s relationship with its fans in one of the most sophisticated data-driven marketing and operations strategies in professional sports.

The Magic partnered last year with Venuenext, a small San Jose tech firm with a clientele that includes the New York Yankees, San Francisco 49ers and the Dallas Cowboys. Venuenext has integrated the Magic’s data platform with myriad third-party ticketing, food & beverage and retail systems, and driven apps out to the edge in the form of season ticket holders’ smart phones – a “remote control” for the fan experience. Together, Venuenext and the Magic are intent on using predictive analytics to transform not only the experience of attending games at the Amway Center in Orlando but also the very nature of the fan-team relationship.

In short, long gone are the days when a sports franchise could add Gulden’s to the yellow mustard in the hot dog condiment rack and jug wine to the concession stand menu, and call it a premium fan experience. Now teams are using data analytics for personalized interactivity with their fans that delivers entertainment, benefits and a sense of membership – along with more revenue.

Orlando Magic CEO Alex Martins

Orlando Magic CEO Alex Martins

“We’ve created an app that really coalesces all of the platforms that we utilize from a fan experience standpoint into one location, into one app for our fans on their smart phones,” said, Alex Martins, who joined the Magic’s as PR manager in 1989 and became CEO in 2006. Speaking recently at the Structure Data conference in San Francisco, Martins said Venuenext handled the platform integration work.

“None of this was possible before our partnership with Venuenext,” said Martins, adding that the sports industry has been “late to the game” in the use of data analytics. “They created the platform that we’ve been able to integrate everything from POS systems, to ticketing systems to parking, and it’s allowing us to collect more data and utilize it to grow our business in a way that we’ve never been able to do before.”

According to Venuenext CEO John Paul, who spoke with Martins at the Structure Data conference, an important factor to the effectiveness of the partnership was the Magic’s willingness to give Venuenext full access to its existing data analytics system and allow it to be integrated with the Venuenext platform. “I’ve never heard that before or since,” Paul said. This was the basis for Venuenext then integrating the Magic’s data and analytics models with a number of third-party retail and ecommerce systems, from Google Wallet and Apple Pay to Ticketmaster to the Chick-fil-A promotional couponing app.

Paul said Venuenuext opens its APIs to its clients to encourage broader systems integration. “Mostly, we’re integrating other peoples’ platforms into ours, but we’ve said to everyone that, yes, we’d like more innovation, we can’t do it all,” said Paul. “So we will open up our APIs to our partners, let them utilize what we’ve got in our platform, and everything we add to our platform makes it more productive.”

The result is that much of the fan experience is now done in the form of active, personalized engagement with the Magic, activity that generates data fed into an increasingly intelligent, predictive platform. “They can do everything from parking, to transfer tickets, to upgrade their tickets, and order food and beverages from their seat and have it delivered to their seat.”

Martins said one aspect of the mobile app is a monetary and merchandizing system allowing season ticket holders to sell individual game tickets back to the team and receive in return “Magic Money,” account credits they can use to purchase additional tickets to future games, for food and beverages or for retail items sold in the Amway Center. “We’ve changed what used to be a season ticket purchase into what is now a membership, utilizing all this data,” he said.

Indeed, much of the Magic’s focus – as with any sports team – is maintaining a high season ticket renewal rate, and much of that involves enhancing the value of fans’ investment in the team.

“Think about it from a fan point of view,” said John Paul, CEO of Venuenext. “For the old season ticket holder, here are two sets of tickets, he’s had them for 20 years, his dad passed them down to him. There are 41 games, he can’t go to all of them, so he generally feels he’s overpaid for those seats. But what the Magic have done is said, ‘If you’re not going to use those seats turn them back in.’ They get to sell the seat again on the secondary market, or they get to use them for one of those upgrades, and I (the fan) get that value, I can come back to the next game with five people in five seats together. So I feel like I’m a member of the Orlando Magic Club, and I’m getting all the value I’ve paid for my season tickets.”

One fan, Paul said, uses his Magic Money to pay for valet parking at the Amway Center.

The Magic have also taken on the single biggest fan complaint: missing portions of the game while waiting in long concession lines. Season ticket holders using the app get access to fast lane passes.

A key to the success of the mobile app has been its high adoption rate. According to Paul, most fan apps get around 5 percent usage, but approximately 80 percent of the Magic’s season ticket holders have the team’s app, and roughly 30 percent of the fans in attendance at games use it. App downloads are up 300 percent over a year ago, according to Martins.

“This has been a challenge for our industry,” Martins said, “finding a manner in which we can raise the number of downloads for our apps for teams. We’ve been trying to crack that code for several years. This finally has been the one opportunity for us to increase those downloads exponentially and increase the use of our apps overall.”

With a high app usage comes more data, providing the organization better predictive insight into fan behavior. One of the first things the Magic did with their enriched data was to bring rigor to what had been an annual exercise in intuitive guesswork: setting ticket prices for the following season.

“When we were going through things like yield management, and ultimately determining ticket pricing, the annual exercise was like, ‘Ok, based on how the team was this year we’ll go with a 4 percent increase,” Martins said. “Or, ‘We weren’t very good this year so we’ll remain flat.’ There was no data that went into the whole ticketing process.” Once ticket prices were set, prices remained static.

Now, Martins said, ticket price strategy is based on data. The process begins with season ticket pricing, which is decided upon with a predictive analytics model that assesses the likelihood of renewals based on actions of season ticket holders during the previous year. Factors include: how often they utilized their tickets, how many times they transferred or sold their tickets, how much food and beverage they purchased, and so forth. All these behaviors indicate the level of fan commitment and their tolerance, if any, for a price increase the following year.

“It allowed us to build a predictive model and create a road map for our service reps to determine which of our clients we needed to spend the most time with to, hopefully, transition them into renewing for the following season,” said Martins.

Individual game tickets are based on season ticket pricing, but are then handled in a fast changing, game-by-game manner based on real-time data from the secondary market. “We transitioned to a dynamic ticket pricing system, whereby our ticket prices change by the minute, based on the demand and real-time data that feeds our entire ticketing system.”

Pricing factors include which night of the week the game is taking place (“Monday night games are death” - Martins), the quality of the opponent and whether the team is on a hot or cold streak. The variables are fed into the Magic’s seven-tier ticket pricing system.

Adoption of these techniques has come at a particularly useful time for the Magic because the team – in possession of a lowly 29-40 record, good for 12th place in the Eastern Division as of this writing – has been in rebuilding mode for three years.

“Traditionally what would happen in those periods of time is you’d just accept that your team was not that good, you would set your pricing at a flat level compared with the year before and you’d deal with fact that your volume would be down,” Martins said. But now the team can better track high-demand games and set ticket prices accordingly, or drive volume on what would otherwise be a low-revenue game by lowering prices.

“What it’s done is help us raise our top line revenue in a period of time where we just haven’t been very good on the court,” Martins said.

Another strategic benefit of the system is precise measurement of sponsor promotions.

“We’re working on a system where instead of handing fans a coupon as they’re leaving the building that they can redeem tomorrow,” said Martins, “we’ll be able to integrate that into our app, they’ll be able to take the Magic app to the sponsors restaurant the following day and through POS interaction redeem that food order right off the application. All the data we can exchange and interchange with our clients. It gives us the ability to say, ‘You know what, you’re promotion truly worked this year,’ as opposed to guessing at how many people showed up to redeem those sandwiches.”

Martins said the Magic’s use of data analytics is only just beginning. Future ambitions include couponing promotions that tell fans in advance which sponsors will hold give-aways of which specific products and a gamification strategy designed to increase fan engagement during games that includes specialized replays and statistics.

“From our standpoint it’s another opportunity to take this app to a whole other level and really engage the fans with the play on the court itself,” Martins said.

Cloud Platforms Help Wrangle Trader Data

$
0
0

A new data-as-a-service platform targeting high-frequency traders adds momentum to the growing trend toward combining cloud-based computing with precise market data. In this case, a widely used time-sensitive data format called GPS packet capture.

Manhattan-based financial software and data management specialist MayStreet this week rolled out a data platform that targets precise, GPS capture data, otherwise known as .pcap, collected by liquidity sites and processed by cloud-based computers. The platform is designed to address the growing need for speed along with extremely precise data presented in specific formats such as .pcap that are more amenable to market research and compliance workflows.

What differentiates MayStreet's data platform, the company claimed, is its ability to handle data in a variety of formats ranging from .pcap files to binary-compress formats. The combination of packet-capture data and MayStreet's software and cloud-based computing is intended to help build research and compliance infrastructure for the U.S. equities, options and futures markets.

MayStreet CEO Patrick Flannery noted in a statement that the goal was to make available to more traders the same "foundational tools" used by leading high-frequency trading firms. The combination of precision packet capture, cloud computing resources and the company's software suite are intended to help buy- and sell-side firms merge large volumes of market data in a variety of formats with information on trading activity.

The payoff, Flannery asserted, is "a comprehensive view of the markets at exacts points in time in the past."

The company's approach also reflects the ongoing search for competitive advantage in cutthroat capital markets that place a premium on high-quality data, flexibility and reduced operating costs. Hence, industry watchers note that more trading firms no longer wanting to manage their own IT infrastructure are shifting their operations to the cloud as reliability, trust and security improves.

MayStreet's data platform for traders also illustrates a shift within the financial technology sector to what analyst Terry Roche of market research TABB Group calls "third-generation platform services." Cloud-based platforms combine computing with big data analytics, mobile technologies and social media platforms.

The emerging platforms "will enable a holistic global community to collaborate, be informed [of] market ideas and opportunities delivered with high levels of security, efficiency, resiliency and regulatory compliance," Roche noted in a research paper.

Moreover, cloud-based platforms "will eventually affect all financial services workflows," Roche predicted.

While latency issues remain an ongoing technical challenge for high-frequency traders, MayStreet's platform also underscores the growing role of latency in data analytics. One reason is that traders are required by law to retain massive amounts of data. That has created a requirement for low-latency platforms to churn through data to meet regulatory reporting requirements.

"The bigger the data set the faster the computational regime to analyze that data in a time sensitive manner," Roche said.


The Emerging ‘Internet of Machines,’ FPGAs, and the Discovery of Knowledge We Don’t Know Exists

$
0
0

The advanced scale computing landscape of the future will have a broader diversity of processors, a focus on matching processors to application domains, and the use of machine learning techniques that teach systems to self-optimize as they take on problems of the highest complexity. A key enabler: FPGA-driven accelerated computing in the cloud. That’s the vision of Steve Hebert, CEO of Nimbix, a provider of HPC cloud computing services. Hebert outlined the path to this future at the recent Nimbix Developer Summit held near Dallas. Here are highlights from Hebert’s keynote:

I’m going to talk about a concept called the Internet of Machines, the interconnection and networking of machines. I want to paint a picture of the world we see coming that’s a direct product of the Internet of Things and all the data that’s being produced that we have to process.

We’ve lived by the idea of Moore’s Law, it’s woven into our DNA. What’s really interesting is not whether Moore’s Law is alive or dead. What’s interesting is the concept of “predictive comfort,” the idea that the entire industry, every equipment manufacturer that builds something that uses chips, has this predictive comfort of knowing that in two years we’re going to have double the transistors and new performance and new capabilities.

It’s hard to step back from this because we’ve thrived for 50 years on Moore’s Law. It begs the question: what happens if this predictive comfort goes away? As we’ve move into the era of multicore and heterogeneity of architecture, we see erosion of predictive comfort. The market is tussling over what comes next. And the very question is forcing innovation in new areas. Specifically, I believe we’re seeing a deeper focus on the applications, the functions these applications are demanding, versus general purpose architectures. And we’re going to have the silicon real estate to start to explore this.

Nimbix CEO Steve Hebert

Nimbix CEO Steve Hebert

Our conundrum at this moment, as developers and technology workers in this ecosystem, is this: at the very moment that our chips stopped getting faster we have this tremendous explosion in data that we have to process to help solve problems, to introduce new services, to scale and tackle all the complexities of our world today.

Part of this challenge is that we have a number of different application domains that are extremely demanding. It’s clear in these domains there are particular compute requirements that might be favored in one application domain versus another. Some applications may want traditional CPU cores, whether its x86 or POWER or CUDA. Some applications demand significant amounts of memory; some want different flavors of floating point , and all of these things impact what solutions are brought to market.

What’s interesting is that the general purpose architectures of the past many years are not quite getting us there when it comes to the specific demands of these application domains and what they’re requiring of infrastructure. When I talk about the Internet of Things, data transformation and data analysis is there as well. We have billions and trillions of attached devices to the internet that are producing troves and troves of data. So we then have to think about the useful things we want to do with all this information, how do we transform that data into answers and new knowledge. We’re in the midst of transforming and accelerating how we tackle the demands that are going to be thrust upon us in this model.

What are the demands of this paradigm? One of the key demands is real time answers, or very near real time. For example, what Google has done with internet search (with voice command). We’re now simulating whole systems, not just samples or a subset. Or gaming, where NVIDIA is doing multiple physical simulations in real time of fluid dynamics, with smoke, water, explosions and all being computed in real time, on the fly. This has exponentially increased demand on computation.

So if we argue we’re approaching, or have arrived at, the end of Moore’s Law, this is driving the adoption, from a software perspective, of new architectures. We’ve had alternative architectures for a long time. But we’re finally seeing a point where the economics of driving new architectural evolution is being thrust upon us.

One of the big takeaways of this is that the tools are arriving for software developers to take advantage of this right now. One of the more prolific co-processing architectures is attached GPUs. And because they were designed to do image processing, a specific application, really well, we’ve been able to apply that to a whole host of other problems. The other thing about GPUs is that for specific applications you can see significant speed-ups relative to CPUs, it’s an alternative architecture to a general purpose CPU-based approach to solving a given set of problems.

(At the core) of my thesis on the Internet of Machines are FPGAs. The idea is you have a blank slate of silicon that you can put whatever you want inside to affect your computation or specific function. These have been very important components in the communications industry, and with FGPAs we’re seeing the entry point of a revolution in the computing industry. This is evident in Intel’s purchase of Altera for 17 billion. That makes a statement.
As you reach the end of traditional process technology you re-orient your thinking about what we can do with that silicon real estate. The innovation trajectory becomes more about what to do in the silicon, and that drives costs down, which means we can now start to see a more widespread use of this kind of technology.

Let’s look at machine learning. It’s no coincidence it’s emerging around this time of transformation because we need automation to help us chug through all this data. Let’s train machines to help us. We can apply “unsupervised learning” to assemble unstructured data. This is very important because there’s a lot of information we don’t even know, or knowledge we don’t know exists. It’s there, but because we don’t have the capacity to process the petabytes of data to create structure or categorization of unstructured datasets, we need to teach our machines to do that.
This is evolving very rapidly. It’s due to the availability of cloud computing, the ability to scale out to help any developer leverage and create a data model and teach it to learn.

Let’s apply machine learning to reconfigurable silicon (FPGAs) and data processing as a process. We have access to millions of API calls. We can create in the Internet of Things millions and millions of API calls that can be fed to train machines how to best process them. And you can actually define those rules that you wish to optimize around. So we might want a machine to, say, look at the most energy efficient way to process for a payload. Or let’s optimize for run time. Or let’s optimize for lowest cost. Whatever the set of rules for the environment, we can teach the machines to tune for those specific things.

The idea with reconfigurable silicon as an integral part of a cloud paradigm is to take a set of workloads, whether those are grouped as labels, rendering or simulation, allow the machines to define the optimal way to process those workloads. Further, let the machines teach the machines what needs to be processed to begin with. There may be a set of things we don’t know we should process, but the machines then can develop an idea based on what we teach them what should be processed, what data is valuable, what information we might new insights from.

Where we arrive at is at the “engine room” for the Internet of Things. I think of the IoT as the attached devices themselves, and the Internet of Machines sits around it and is the engine room that processes all this information to drive meaningful answers. So think of it as intelligent systems that, with reconfigurable silicon, have the tools to self-optimize, to program their own bitstreams, to be able to not just automate but accelerate the distribution, collection and transformation of the massive amounts of data that we have.

I like the term “accelerate” because we live in a world of exponentials. In the technology industry we see the acceleration of the curves we are on, and these systems help give us yet another exponential component in accelerating how we can process information. Which means we can cure cancer faster, solve world hunger faster, colonize Mars faster, solve transportation logistics faster.

As our population continues to swell, there are problems and challenges we want to meet with answers faster than we could if we were on a traditional linear curve. This is what I’m extremely passionate about: how we help accelerate our time-to-results of some of our most complex problems.

We believe this is the initial evolution of reconfigurable cloud computing for hyperscale data processing, the ability to have a set of machines that can self-optimize at the appropriate time. We’re not there yet. We’re at the first evolution. Let’s introduce the concept, let’s introduce the technology, put it into the hands of the community, into the hands of the smart people who can help write the code, create the algorithms that then allow us to evolve these sets of systems that help us process the data.

Big Data at Netflix: Testing, Testing, Testing…

$
0
0

Netflix is a company riding, you could say driving, the streaming internet video wave. Much of that wave is founded on Open Source big data analytics in the cloud at scale – at extreme scale. Five years ago, when Brian Sullivan, director of streaming analytics at Netflix, joined the company, it was still focused on what now seems a primitive business model: mailing DVDs to a U.S.-only customer base.

While the company still has a mail-order remnant, Netflix is nearly completely converted to streaming internet video, a company with an enormous title catalogue and a successful content creator (“House of Cards,” “Orange Is the New Black,” etc.). In addition, Netflix has rapidly expanded 81 million customers in 190 countries (excepting the most populous one, China: “We’re still trying to work out some of the details” – Sullivan), the first internet television network. Driving this growth is a “bias to action” ethos that, as the company has embraced new data technologies, pushes managers like Sullivan and his team to innovate based on continual experimentation and a test-and-adopt discipline.

Speaking at the recent Apache Big Data conference in Vancouver hosted by The Linux Foundation, Sullivan shared the outlines of Netflix’s data analytics-driven methodology, which has as much to do with technology as it does its “freedom and responsibility” culture, whose aim is to preserve an aggressive, self-directed and loosely coupled entrepreneurialism in a company that’s grown to more than 3,500 employees (and $6.7 billion in revenue). The Netflix ethic is to continually disrupt: supersede the old neighborhood video store / Blockbuster model, then the Redbox retail kiosk model and then, finally, its own DVD-by-mail model.

Brian Sullivan of Apache

Brian Sullivan of Apache

“We are truly are a data-driven organization,” Sullivan said, “It’s in our blood. Any time anyone proposed a change to the product, whether it’s to add a feature or to simply improve functionality, we test it. We’ve built a really robust experimentation framework and we use it to analyze any change we make to the product, live and in production, with a subset of our users. And since we’re talking about such a large population of our user base to experiment on, we’re able to do this in a very rigorous fashion with our big data (capabilities). This allows us to iterate quickly through experiments in parallel and talk intelligently about the outcomes.”

Changes are adopted if they meet two criteria: “If a change doesn’t specifically move the needle on a key metric, like retention, we typically don’t move forward with the change.” In addition, changes can only cause minimal dislocation. “Our goal is to improve the product but also keep it as simple as possible. This allows fewer barnacles to develop, which would slow things down for any future innovation.”

Netflix is committed to Open Source, with many Apache ecosystem products in store. Cassandra is critical to Netflix operational systems, Kafka is the backbone of its data event pipeline, and much of the analytics is done using Hadoop with a mixture of Pig, Hive and Presto, with increasing use of Spark. In addition, Netflix is committed to the cloud: Amazon Web Services.

It's an infrastructure that operates at scale: delivering and supporting 125 million hours of streaming video per day (more than a third of North American internet traffic); from a data standpoint, Sullivan said, that’s about 600 billion daily events. The Netflix data warehouse contains roughly 40PB and processes 3PB of data on a daily basis, adding about 300TB of net new information.

“What’s interesting about our ecosystem is because we’re big believers in cloud, we use AWS, so that allows us to put our data on S3. It’s a natural place for our data LAN because this is where our service systems are. It’s one central repository where we can spin up multi Hadoop clusters on top of. That allows us to separate our compute layer from our storage layer, and it lets us custom scale different clusters if we distribute our processing a little, or do things like upgrades of Hadoop or even introduce new tools, like Spark.”

Netflix is driven by both the expansion and retention of its customer base. Sullivan said the company has the advantage of a customer-centered model undiluted by the contradictions built into the model of other large internet companies, some of which sell their customer data, or customer lists, or include advertising in their content.

“We have a holistic relationship with our customers,” Sullivan said, because the single source of Netflix revenue is customer subscriptions. “The dual nature of some of the other companies’ (business models) leads to potential conflict in product development – how to introduce ads but not make them so annoying that users go away. We don’t have to worry about that. We have a really central relationship with our subscribers because they are effectively giving us money to steam video and we can make that product better, and if we do a really good job at innovating our product then people retain their subscriptions, and if we don’t do a great job people will cancel.”

The alignment between Netflix and its customers, Sullivan said, extends to how it licenses content. Subscribers can watch a given piece of content as many times as they would like. “When we buy rights to stream a particular title in our catalog we’re buying that title for a period of time, not on a per-play basis, like music rights are. We want you to consume as much of Netflix as you can or you want to, and that shows me you’re engaged with the product. Not surprisingly, streaming usage is highly correlated with retention.”

He also said that from a content perspective, Netflix isn’t bound by some of the limitations of traditional broadcast television networks. “We’re not worried about a broadly appealing, small set of shows that go into a limited number of prime time slots. We can think about a broader catalog that we know will be enjoyed by a wide spectrum of people, each one with differing tastes, and they can watch exactly what they want to watch. We don’t have to worry the ‘lowest common denominator’ problem.”

With its relatively straightforward customer relationship in place, Netflix goes about using data analytics to measure all aspects of customer interaction. This includes the user interface, which is constantly tinkered with. “We’ve experimented with all sorts of tweaks on the layout, what types of metadata we show for any given show, improved feature like search, added features like user profiles, and even down to the images that we use in the UI. The idea is certain images will pull more people into stream more.”

He said regions and countries respond differently to various designs – in some, print is either more or less of a draw than images, for example. “We were able to measure this in a statistically significant fashion in our subscriber base. The idea is to keep more people engaged and really finding quickly the thing they want to watch within the product, and that has a virtuous cycle of feedback into enjoyment of the product. It’s interesting to learn more about our users and to personalize off of that.”

Sullivan said he and his team also continually refine their predictive analytics capabilities. “We can tune our recommendations based off of explicit signals, like user ratings, and we can also look across broad patterns across our ever changing user data. We evolve our algorithms using this changing set of features and models to get the right titles in front of the subscribers, ideally predicting exactly what they want to watch.”

He broke down the infrastructure in several parts, starting with the streaming platform, which resides under the UI, and is supported by four engineering groups. On top of that is AWS, which builds the dozens of server applications used by Netflix and delivers content to the device ecosystem (phones, tablets, smart TVs, game consoles, etc.). Device usage data is analyzed “and that allows us to be smarter about how we partner with the device partners – Sony, Samsung, Apple, Google, Microsoft.”

In the middle of the infrastructure, Sullivan said, is the streaming client, a source of telemetry from the subscriber’s device that provides data about the actual playback experience – how much time it takes for videos to start up, the quality of image resolution and how often subscribers encounter rebuffers and playback failures. This is accompanied by data about the performance of the delivery network used to distribute the enormous amount of Netflix in a scalable, robust, and cost-effective fashion.

Netflix also measures an aspect of content delivery beyond its control – ISVs. The company publishes the Netflix “ISV Speed Index,” a monthly global statistical breakdown of content download performance.

“We think about the response time of servers, we think about our cloud utilization in AWS, both from a performance and a cost standpoint, we think about the availability of our service,” Sullivan said. “It should look and feel like DVD content, you hit play and it should be on. We measure our rate of innovation. Normally you don’t want to touch something if you want to keep it from breaking. We want to change it constantly and we want to be highly available, so we stretch that muscle all the time.”

Nielsen and Intel Migrate HPC Efficiency and Data Analytics to Big Data

$
0
0

Nielsen has collaborated with Intel to migrate important pieces of HPC technology into Nielsen’s big-data analytic workflows including MPI, mature numerical libraries from NAG (the Numerical Algorithms Group), as well as custom C++ analytic codes. This complementary hybrid approach integrates the benefits of Hadoop data management and workflow scheduling with an extensive pool of HPC tools and C/C++ capabilities for analytic applications. In particular, the use of MPI reduces latency, permits reuse of the Hadoop servers, and co-locates the MPI applications close to the data.

John Mansour, vice president, Advanced Solutions Group, at Nielsen became interested in the integration of both Hadoop and HPC technology to enable faster, better, and more powerful analysis of the huge volumes of data collected by Nielsen as part of their Consumer Package Goods (CPG) market research. Nielsen is well-known for the ‘Nielsen ratings’ of audience measurement in Television, Radio, and online content. The company also provides Retail Measurement Services (RMS) that track and report on CPG sales around the world to understand sales performance. The success of Nielsen’s efforts are presented in his talk Bridging the Worlds of HPC and Big-Data at Supercomputing 2015.

Nielsen already utilizes the Cloudera Hadoop infrastructure to ingest and manage a daily deluge of data used in their market research. What Nielsen wanted was to make this infrastructure HPC-friendly so the wealth of scientific and data-analytic HPC codes created since the 1960s could be added to the Nielsen set of computational tools. This required integrating MPI (Message Passing Interface), which is the distributed framework utilized by the HPC community, into the Cloudera Hadoop framework. This integration allows Nielsen the choice of using C/C++ MPI in addition to Spark and Map-Reduce for situations that either require the performance or are a team’s preferred language.

Nielsen thinks Integrating Hadoop and MPI brings together the best of two complementary technologies. This integration will provide the data management capabilities of Hadoop with the performance of native MPI applications on the same cluster. Intel and Cloudera plan to provide production support for this integration in future releases of their software while Nielsen continues to explore the possibilities that such an integration will have for their clients.

Nielsen thinks Integrating Hadoop and MPI brings together the best of two complementary technologies. This integration will provide the data management capabilities of Hadoop with the performance of native MPI applications on the same cluster. Intel and Cloudera plan to provide production support for this integration in future releases of their software while Nielsen continues to explore the possibilities that such an integration will have for their clients.

MPI has been designed and refined since the 1990s to remove as much of the communications overhead from distributed HPC applications as possible, while Hadoop and the cloud computing infrastructure in general has been designed to run in a big-way on COTS (Commodity Off The Shelf) hardware where fault- and latency-tolerance is a requirement. A successful integration of the two means that existing MPI and data analytic codes can be ported without having to be re-implemented in another language such as SPARK, and very importantly, the integration can occur without affecting existing operational cloud infrastructure.

The integration, performed in collaboration with Intel, is quite straight-forward from a high-level perspective: simply start a python script that requests resources based on a set of input parameters and writes out a machine file that can be utilized by mpiexec to run the MPI job. The script then starts the MPI run and cleans up resources upon completion.

In actuality, the process is more complicated as it is necessary to ensure the data is in the right place and that errors are correctly handled. Nielsen uses Cloudera’s llama as the application master and yarn as the resource manager.

The performance of MPI in the Nielsen Hadoop framework has been superb and is expected to get even better. In testing with other Hadoop technologies, Nielsen has found MPI to consistently perform better than the others. Speedups come from the use of C/C++, sophisticated numerical libraries such as those offered by the NAG Numerical Algorithms Group and MPI’s design for low-latency communications which help in tightly coupled communications such the reduction operations needed in regressions and machine learning applications. In a future publication Nielsen will provide more detailed performance comparisons but typically see about a factor of between 5 to 10 times in performance compared to SPARK 1.5.1.

All this work to date has been at the proof-of-concept (POC) phase. In particular, high-performance storage I/O has proven to be an issue with significant amounts of runtime – sometimes as much as 85% – being consumed by the data loads. The challenge is that HDFS, which is written in Java, appears to be a bottleneck. Nielsen is experimenting with different technologies including local file systems and new apis such as RecordService and libhdfs3. Unfortunately, there are issues using common MPI data methods like mpiio which present a problem in Hadoop.

In addition to optimizing I/O performance, Nielsen has demonstrated significant performance benefits preloading data into distributed shared memory using BOOST shared memory STL vectors. With a working MPI and ability to integrate existing C/C++ codes, Nielsen has opened the door to a wealth of computational tools and analytic packages. In particular, the NAG library is a well-known, highly-regarded numerical toolkit. For example, NAG offers routines for data cleaning (including imputation and outlier detection), data transformations (scaling, principal component analysis), clustering, classification, regression models and machine learning methods (neural networks, radial basis function, decision trees, nearest neighbors), and association rules plus a plethora of utility functions.

Author Bio:
Rob Farber is a global technology consultant and author with an extensive background in scientific and commercial HPC plus a long history of working with national labs and corporations. He can be reached at info@techenablement.com.

MapR Rolls Hadoop Migration ‘Easy Button’

$
0
0

MapR Technologies Inc. rolled out a migration service for its Hadoop distribution that targets what it says is growing demand for moving big data production installations to its converged data platform.

San Jose-based MapR said this week its quick migration service continues to run an existing Hadoop distribution while transitioning users to its converged data platform. The migration service also seeks to leverage a new appliance announced by Cisco Systems (NASDAQ: CSCO) running on SAP HANA (NYSE: SAP) that integrates the MapR converged platform. Cisco’s UCS Integrated Infrastructure for SAP HANA that incorporates the MapR platform is based on the B460 scale-out platform with C240 storage servers.

MapR said the migration service essentially creates an "easy button" for moving to the MapR Hadoop distribution as enterprises shift mission-critical applications to converged platforms. The service is touted as a way for enterprises to move their data with minimal impact on real-time applications and workflows.

The database specialist also pointed to several use cases involving an unnamed financial institution and a government agency in which competing Hadoop distributions were shifted to its converged data platform. The bank is now running multiple real-time applications on fewer nodes while the government agency is using the MapR platform for real-time mirroring across distributed datacenters, the company said.

The data platform cluster is installed and configured based on a user's existing IT infrastructure. Use cases and data are then moved to the MapR cluster while the existing Hadoop distribution continues to operate, preventing downtime or data loss.

The first step in the transition to the new data platform is data ingestion that includes identifying multiple data sources and file formats. These data are then loaded into data structures designed to facilitate data analytics. Application development includes building Java MapReduce jobs, implementing in Apache Pig, the platform for creating MapReduce programs used with Hadoop.

Along with building distributed indexes, the migration service also constructs Apache Hive, or HQL, queries as well as building new data models.

Once data is migrated to the MapR converged platform, the company said it dispatches data scientists to determine specific use cases, business priorities as well as existing workflows and data sources available for big data analytics. After raw data is moved over to the platform and restructured, the service includes a creation of a custom data model based on specific use cases.

The service also illustrates how big data vendors are striving to make data analytics tools like Hadoop and Apache Spark more widely available to enterprises as in-house analysts become more proficient using data analytics tools. The MapR initiative is part of a larger effort to move enterprise data closer to compute, storage and other IT resources to facilitate big data analytics on converged platforms.

 

An Insider’s Guide to Building a Cloud-based FinServ Big Data Analytics Shared Service

$
0
0

Even if you’ve never lived through a data analytics development project at a financial services organization, you know it’s a complicated undertaking. But you may not know just how multifaceted, how layered, how political – how hard –a project of this kind can be.

Technology is only half the challenge.

Julio Gomez, a veteran technology manager and consultant in the financial services for more than 20 years, knows all this. Speaking at the Big Data Everywhere conference in Boston this week, Julio Gomez (industry strategy partner at data analytics consultancy Knowledgent, New York) delivered an A-to-Z portrait of a 12-month big data analytics shared service engagement at a major financial institution (that will remain anonymous).

It began in June 2014, when the firm faced a classic big data challenge: their data was siloed, their business units were siloed – everything was siloed – leaving the company unable to share data or gain organization-wide insight. Within the sequestered business units, what little “analytics” that was done was actually data management, data preparation and wrestling with disparate data types. The company realized their data had tremendous potential value that lay dormant.

What was needed was a cross-functional, cross-organizational system that would go out across all the business units and functional areas of the company, providing a way to release all the data through a centralized shared service that could perform advanced analytics. And that automated much of the byzantine data management complexities that were consuming so much time and resources.

Julio Gomez of Knowledgent

Julio Gomez of Knowledgent

“It was a system that would be capable of accessing the data, ingesting the data, cleansing the data, understanding data lineage, and then making it available for consumption,” Gomez said. “If we could manage all that within the shared service, that would alleviate the business units of that burden and free them up to focus and brainstorm on the analytics side – that was our objective,” Gomez said.

Among the challenges Knowledgent faced was the client’s lack of data analytics wherewithal.

“This is still early days for big data,” Gomez said. “Imagine back in mid-2014 the task that was at hand. The common theme: folks wanted to leverage big data – whatever that was – and they wanted to do advanced analytics. So the group decided, based on this common theme, that it might be good too if they actually collaborated and did this together, as a shared service. That was the genesis of the project.”

It was decided that the initial Proof-of-Concept phase would focus on areas of the company where there was a relative lack of data management maturity, where reducing the pain and friction of data management in support of analytics would do the most good.

Here is where Knowledgent’s educational challenge began.

“It was really an attempt to introduce an Agile methodology and be iterative in the process of building out a new organization, as opposed to a waterfall,” Gomez said. “We wanted to work on being more iterative with the business units as well as paying very close attention to the internal and external requirements for the usage of this data.

“This is something we came across over and over: we’re going to run into problems with how we can use this data so we need to plan for that up front and ensure we’re facile in dealing with those concerns. That may sound good when you think about planning for it, but it’s not easy.”

Knowledgent was exacting in its planning discussions with the client.

“The first thing we had to do was get on the same page for our mission, to really articulate this,” Gomez said. “We worked out the mission and operating values, and that was actually a lot of work. When you have a blank canvas and you have an organization trying to do something transformative, it’s actually very difficult to do. So we worked hard to establish the mission statement to allow us to govern how we built out the service going forward. And then we set out to do the internal selling.”

This initial phase also included framing the interaction between the shared service and the business units. The business units, meanwhile, focused on the business side of the shared service’s mission, “the hypotheses that go into the analytic models, the modeling itself, the testing and the analytic insights that could then be harvested and productionalized, all of it under the umbrella of a proper governance organization and set of principles.”

Gomez said it is often lack of thoroughness in scoping the mission and operational side of a system that leads to failure.

“I want to emphasize that in my experience this aspect of a project can get short shrift,” he said. “We’re often very quick to determine we already know what is needed. Too often, we take a few bits and pieces of information from the business and assume that is all we need to know to go forward. But only with thoroughly engaged and consistent dialogue can you really tease out the key information you need to make the service a success.”

That dialogue also helps secure buy-in from the business units as the project unfolds.

As the mission scope process neared completion, Knowledgent also tasked itself with building out a technology roadmap and design that incorporated the strategic decision – made by the client – that part of the system would reside in the cloud, impacting both the conceptual and logical architecture. “That was a challenge, but it was a requirement, so we had to deal with that,” said Gomez.

By this time, Knowledgent had developed an operating model, a technology architecture. Next step: put the concept into a POC system.

In their scoping work, Knowledgent realized that the use cases developed with the business units fit neatly into three broad categories: selling, modeling and risk management. Now Knowledgent was ready to identify the POC it would develop based on the value it would provide to a business unit and the feasibility and readiness of that organization.

knowledgent slideIn identifying use cases, Gomez said, “it’s incredibly important to go at least one level deeper, to figure out not just the description of the use case but also what is the business value that it creates, and then to articulate that very specifically. You also need to understand what it takes to execute against it, what are the categories of business data sources that you need to access.”

For the first POC, the mutual fund distribution business unit was selected. ”All their analytics people were working on managing the data. They wanted to see the impact of some of their campaigns on sales and draw that correlation, and we aspired to show them that.”

A lot was at stake. The POC would – or would not – deliver proofs of value, demonstrating not only the technology but also “our ability to work as an organization, to test our processes and to test our ability to work with a business unit.” On the technical side, a key goal was to expand the types of data sets that could be rationalized, managed and analyzed.

Knowledgent set out to build the big data environment – not just the architectural layers “but also what was needed to make it all work together,” connecting on-premises capabilities with cloud capabilities, a highly complex proposition. “We were dealing with connection, permission and firewall issues, PII. But we relentlessly pushed this thing forward.”

Three months later, in August 2015 (nine months into the project), the POC was completed. “We basically have a well-defined, well planned organization; we’ve got an established technical environment.”

And Knowledgent had a happy client. Gomez said the head of the mutual fund distribution business unit “was seeing things they’d never seen before in terms of insights, and they really wanted to go forward and take it to the next level, turning the POC into something they could productionalize.”

This added a higher level of complexity to Knowledgent’s work.

“We started running teams, bringing data in through the environment,” Gomez said. “It wasn’t smooth, we kept iterating, fixing things, strategizing, getting creative, really getting after it.”

Meanwhile, Knolwedgent and the client began the laborious process of developing data governance policies and procedures for taking the shared system across multiple departments. “We had a complex institutional organization to deal with, everyone had a finger in that pie, as you can imagine, so there was a lot of coordinating that was required within different organizations within the enterprise. We forced a lot of issues to the table. We had to because we couldn’t go into production until those issues got solved. In many ways that was the biggest impediment to progress, mobilizing the people who had a voice but not a decision. Everyone had to be made happy in this particularly critical area.”

Knowledgent also moved on to Wave Two Execution of the project across four business units simultaneously. “We were dealing with prickly issues like multi-tenancy, it was really a challenging time for us.” In addition, some functions performed in “a fairly manual way” in the initial POC phase needed to be automated. “We were growing, we were grinding, we were smoothing.”

This was the focus of the final four months of the project, and this phase included testing the technology environment, resolving governance issues, hardened the operating model, and building up the shared service system staff.

By the beginning of this year, Knowledgent was done, a shared service that Gomez said handles some of the most complex data issues facing the business units trying to achieve a higher analytics capabilities.

“It’s a partnership with the business units,” Gomez said, “it’s a partnership with the architecture within the enterprise, it’s a partnership with all the governance stakeholders. It’s being able to develop that partnership with all these units that is the difference between having your big data initiative be a strategic asset for the enterprise versus being put on the shelf as an interesting experiment.”

Machine Data Analytics for DevOps: Five Tips

$
0
0

As more organizations adopt a Continuous Integration/Continuous Delivery approach, in which software is developed in shorter cycles and deployed into production at higher velocity, they need a new breed of tools that run at cloud scale and can be seamlessly integrated with a host of DevOps tools across the entire pipeline.

To help achieve this speed and scale, increasing numbers of cross-functional teams (development, operations, customer success/support, SecOps and line of business) are using analytics to monitor, manage and gain insights from user, application, or infrastructure logs. A new class of automated tools is enabling teams to understand error rates, failures and other information in massive amounts of log and machine data.

Machine data analytics is a relatively new field, and it’s important to know what to look for in a powerful, built-for-the-cloud solution. Here are the top five must-haves:

1: New Functionality without the Wait

With a cloud-native, machine data analytics SaaS, you have the simplicity of being up and running in minutes, as opposed to weeks or months with a traditional, enterprise packaged software. Cloud-native solutions are designed for velocity; new features can be added at a much faster rate, enabling faster time-to-value and better quality for the same price. New updates are rolled out to all users at the same time, so no one is left behind on an outdated version. The result is a machine data analytics platform that keeps pace with the shorter development cycles and agile methodologies you employ to drive your own software release velocity.

2: The Elasticity You Need Without Sticker Shock

In a rapid DevOps world of continuous technology and underlying infrastructure changes, problems can surface when you least expect it. Consider the moment when your production applications or infrastructure has a serious problem and it is “all hands on deck” to investigate and resolve. Suddenly, you have your entire team running simultaneous queries with debug mode activated, bringing in more data than usual.

Suku Krishnaraj of Sumo Logic

Suku Krishnaraj of Sumo Logic

From a DevOps perspective, this scenario is where the elasticity of a multi-tenancy platform outshines a single tenant solution. Since only a small percentage of customers have “incidents” at the exact same time, excess capacity is always available for you and your cross-functional team to use in your hour of crisis. This elasticity also comes in handy to address the expected or unexpected seasonal demands on your application environment. Lastly, with a cloud-based metering payment model, your elasticity does not come at an extra price. On the contrary, you pay for your “average” capacity, even though there may be times when you utilize more to handle those bursts that require five, 10 or 100 times more capacity.

3: Scalability Without Performance Hits

For DevOps engineers, machine generated log data is growing exponentially from their IT environments as their infrastructure and application stacks grow in complexity. In fact, by 2020, IT departments will have 50 times more data to index, search, store, analyze, alert and report on. To address this demand, cloud-native solutions have dedicated tiers for each log management function (ingest, index, search, etc.) that scale independently, to keep up with your data’s volume and velocity of change. And there’s no hassle with managing search nodes or heads in order to ensure search performance. Since not all cloud-based solutions are alike, be sure to understand how much data you can use before being locked out. Remember, it’s precisely the moment you have a problem when you’ll need to scale.

4: Reliability. Reliability. Reliability.

Modern cloud-driven strategies, such as aggregating computing resources in a multi-tenant, native cloud architecture, provides built-in availability for peak capacity without incurring payment penalties for bursting. This architectural approach results in always-on service continuity and availability of your data. Additionally, cloud SaaS ensures enterprise-class support for you and your team. Solving customer problems quickly enables SaaS vendors to deliver value across the entire customer base, which is very motivating for the vendor.

5: Never Overbuy or Overprovision

Rapid development cycles will also fluctuate your machine data analytics needs over time (e.g., more usage fluctuations, more cross-functional projects that will bring more data, etc.), thereby complicating planning. Native cloud solutions that include metered billing alleviate the need to think ahead on current and future needs — simply pay as you go for what you need. Additionally, since you will never overbuy or overprovision, your ROI improves over the long term.

As these five qualities show, the right machine data analytics philosophy – with a focus on purpose-built for the cloud, not merely running on the cloud – allows organizations to optimize the joint value of the "dev and ops orchestration" that defines DevOps.

Suku Krishnaraj is the VP of Marketing for Sumo Logic. He most recently headed the Cloud Business Unit at CenturyLink as VP/General Manager managing P&L & GTM of the Cloud and managed hosting businesses.

Cray Courts the Enterprise with Pre-Configured Analytics Urika-GX

$
0
0

Cray (NASDAQ: CRAY) continued its courtship of the advanced scale enterprise market with today's launch of the Urika-GX, a system that integrates Cray supercomputing technologies with an agile big data platform designed to run multiple analytics workloads concurrently. While Cray has pre-installed analytics software in previous systems, the new system takes pre-configuration to a new level with an open, “enterprise-ready” software framework designed to eliminate installation, integration and update headaches that stymie big data implementations.

Available in the third quarter of this year, the Urika-GX is pre-integrated with the Hortonworks Data Platform, providing Hadoop and Apache Spark. The system also includes the Cray Graph Engine, which can handle multi-terabyte datasets comprised of billions of objects, according to Cray. The Graph Engine runs in conjunction with open analytics tools for support of end-to-end analytics workflows, minimizing data movement. The system includes enterprise tools, such as OpenStack for management and Apache Mesos for dynamic configuration.

The Urika-GX features Intel’s recently announced Xeon “Broadwell” cores, 22 terabytes of memory and 35 terabytes of local SSD storage capacity. Three initial configurations will be available in two-socket 16, 32, or 48 processor nodes (up to 1,728 cores per system) delivered in a 42U 19-inch rack. Cray said larger configurations will be available in the second half of 2016.

While Cray’s heritage comes out of the traditional supercomputing industry, where technology and integration staff expertise tends to runs deep, the Urika-GX is designed for the enterprise market, which generally prefers a “solutions” approach, said Ryan Waite, Cray’s senior vice president of products.

Cray's Ryan Waite

Cray's Ryan Waite

“For IT organizations, it’s hard to get these systems up and running,” he told EnterpriseTech. “Some of the big data systems they’ve bought only come with cookbook recipes, and there’s a lot of heavy lifting to the IT organization. So they buy one of these systems, it looks awesome on paper, they pull out the recipe and now they’re responsible for getting the operating system installed, they’re responsible for getting a particular version of Hadoop installed, they’re responsible for making sure that that version of Hadoop works with some other version of Spark, which works with some other version of some other tools that they’re using. And if any of those components are upgraded, or if there’s a cool new version of one of those tools that the data scientists want, the IT organization again is responsible for all that integration testing, and that can be hard.”

IT organizations also struggle with ‘frankenclusters.”

“They build one cluster for real-time data ingestion with Kafka, and another one used for ETL into a data processing system, and yet another that’s part of your Hadoop infrastructure, another one that’s part of your Spark infrastructure, so they have lots of different types of clusters,” Waite said. “That fragmentation is hard on IT organizations because it’s wasteful – some clusters are running at high utilization, others at low utilization.”

He said the Urika-GX system helps eliminate these challenges with an open, agile analytics platform that supports concurrent analytics workloads by combining Cray’s Aries supercomputer interconnect with the company’s industry-standard cluster architecture, the Urika-GD scalable graph engine and the pre-integrated, open infrastructure of the Urika-XA system.

crayimage001“Analytics workflows are becoming increasingly sophisticated with businesses looking to integrate analytics such as streaming, graph, and interactive,” says James Curtis, Senior Analyst, Data Platforms & Analytics at 451 Research. “An agile analytics platform that can eliminate many of the challenges data scientists face, as well as reduce the time it takes to get an integrated environment up and running has become a requirement for many enterprises.”

Cray said early adopters of the new system include life sciences, healthcare and cybersecurity companies. The Broad Institute of MIT and Harvard uses the Urika-GX for analyzing genome sequencing data and reports quality score recalibration results from its Genome Analysis Toolkit (GATK4) Apache Spark pipeline have been reduced from 40 minutes to nine, according to Adam Kiezun, GATK4 Project Lead at the Broad Institute.


New IBM Data Analytics Software Targets Workload Management, Spark Adoption

$
0
0

IBM announced additions today to the infrastructure layer of its high performance data analytics software portfolio, including “cognitive features,” such as scheduling and resource management, and capabilities aimed at easing adoption of Spark.

The new software-defined infrastructure products – in which the data center is managed, provisioned and automated by software regardless of compute, storage or network components – called IBM Spectrum Computing, are intended to reduce the complexity of performance-intensive data analytics and machine learning implementations. They replace IBM’s old “Platform” software portfolio.

Bernard Spang, vice president of IBM Software Defined Infrastructure, told EnterpriseTech that the products are a “blending of what has been traditionally HPC and the need for that technology for enterprise computing, data analytics, Internet of Things,” and other data-intensive workloads.

He said the Spectrum Computing platform offers new resource-aware scheduling policies that will increase compute resource utilization and predictability across multiple workloads, control costs and accelerate results for new generation applications and open source frameworks, such as Hadoop and Apache Spark. It also will assist with consolidating data center infrastructure and sharing resources across on-premises, cloud or hybrid environments.

IBM's Bernard Spang

IBM's Bernard Spang

“We believe this is the industry’s first aggregated software-defined infrastructure offering that includes both software-defined compute and software-defined storage for these new generation scale-out infrastructures,” Spang said. “By combining these capabilities together and building in the intelligence and the automation, we can support a very high-performance and cost efficient infrastructure that can run many of these new generation scale-out workloads.”

The products include:

  • Spectrum Conductor: Designed to speed complex data analytics of data, it works with cloud applications and open source frameworks, enabling applications to share resources while protecting and managing data throughout its lifecycle.
  • Spectrum Conductor with Spark; Aimed at simplifying adoption of Apache Spark; according to IBM, it delivers 60 percent faster analytical results.
  • Spectrum LSF: Workload management software with interfaces designed to facilitate research and design, and control costs through resource sharing and improved utilization.

“The new IBM Spectrum LSF version adds important capabilities to one of the world's leading workload managers for HPC,” said Steve Conway, research vice president in IDC's High Performance Computing group, “a market we forecast will grow from $21 billion in 2015 to nearly $30 billion in 2020. The new and enhanced functions are designed to boost productivity and ROI on HPC systems, along with ease of use and mobile operation. But what's most impressive is the integration of LSF into a coherent IBM Spectrum product family that supports HPC, advanced analytics, cloud computing, and other activities that are increasingly important for existing HPC users and for the growing number of commercial firms that are adopting HPC for their most daunting big data challenges.”

While the software runs on OpenPOWER servers, Spang said they also run on X86 servers, including those from systems vendor Supermicro, which partnered with IBM to optimize its hardware for the new Spectrum Computing products.

“Working with IBM, we have integrated our latest server, storage and networking solutions with IBM Spectrum Conductor with Apache Spark and IBM Spectrum LSF to accelerate deployment of scalable, high-performance analytics infrastructure” said Charlie Wu, General Manager, Rack Solutions at Supermicro. “Our collaborative efforts enable extraction of more predictable results and insight across hybrid cloud environments.”

New Data Analytics Benchmark Puts Stopwatch to Hadoop-based Systems

$
0
0

Ladies and gentlemen, start your clusters.

A new data analytics and machine learning benchmark has been released by the Transaction Processing Performance Council (TPC) measuring real-world performance of Hadoop-based systems, including MapReduce, Apache Hive, and Apache Spark Machine Learning Library (MLlib).

Called the TPCx-BB benchmark and downloadable at the TPC site, it executes queries frequently performed by companies in the retail industry running customer behavior analytics.

HPE has already seized the benchmark pole position, as it were, issuing results showing that its current-generation ProLiant DL380 G9 server, using Xeon E5-v4 processers, came in with a 27 percent performance gain over the previous-generation DL380 server.

The TPCx-BB (BB stands for “Big Benchmark”) is designed to incorporate complex customer analytical requirements of retailers, according to TPC. Whereas online retailers have historically recorded only completed customer transactions, today deeper insight is needed into consumer behavior, with relatively straightforward shopping basket analysis replaced by detailed behavior modeling. According to the TPC, the benchmark compares various analytics solutions in a real-world scenario, providing performance-vs.-cost tradeoffs.

“With the advent of so many big data and analytics systems – from an array of hardware and software vendors – there is immediate demand for apples-to-apples, cross-platform comparison,” said Bhaskar Gowda, chairman of the TPCx-BB committee (and senior staff engineer at Intel’s Data Center Group).

Bhaskar Gowda

Bhaskar Gowda

Gowda told EnterpriseTech the benchmark tests various data management primitives – such as selects, joins and filters) and functions. “Where necessary, it utilizes procedural programs written using Java, Scala and Python. For use cases requiring machine learning data analysis techniques, the benchmark utilizes Spark MLLIB to invoke machine learning algorithms by providing an input dataset to the algorithms processed during the data management phase.”

He said the benchmark exercises the compute, I/O, memory and efficiency of various Hadoop software stacks (Hive, MapReduce, Spark, Tez) and runs tasks resembling applications developed by an end-user with a cluster deployed in a datacenter, providing realistic usage of cluster resources.

It also utilizes, when necessary, procedural programs written using Java, Scala and Python, Gowda said. For machine learning use cases, the benchmark utilizes Spark MLLIB to invoke machine learning algorithms during the data management phase.

Other phases of the benchmark include:

Load: tests how fast raw data can be read from the distributed file system, permuted by applying various optimizations, such as compression, data formats (ORC, text, Parquet).

Power: tests the system using short-running jobs with less demand on cluster resources, and long-running jobs with high demand on resources.

Throughput: tests the efficiency of cluster resources by simulating a mix of short and long-running jobs, executed in parallel.

For the record, according to HPE, the 12-node Proliant cluster used in the first test run on the benchmark had three master/management nodes and nine worker nodes with RHEL 6.x OS and CDH 5.x Hadoop Distribution. It ran a dataset of about 3TB. Comparing current- versus previous-generation Proliant servers, HPE reported a 27 percent performance gain and cost reduction of 9 percent.

SIEM Gains as Consumer Security Software Fades

$
0
0

Security information and event management (SIEM) software fueled a robust global security software market in 2015 even as sales of consumer security software declined sharply last year, according to the latest accounting by market analyst Gartner Inc.

Gartner (NYSE: IT) reported this week that worldwide security software revenue jumped 3.7 percent over the previous year to $22.1 billion. Sales of SIEM software used to support threat detection and response to security breaches rose by a whopping 15.8 percent year-on-year as it gained market traction via its real-time collection and analytics capabilities. The analytics software is used to sift through a wide variety of event and contextual data sources to provide an historical analysis of security breaches.

Meanwhile, Gartner reported that global sales of consumer security software tanked in 2015, dropping 5.9 percent on an annual basis. Market leader Symantec (NASDAQ: SYMC) took the biggest hit, with annual revenues declining by an estimated 6.2 percent from the previous year.

Overall, leading consumer vendors registered a collective decline in revenues estimated at 4.2 percent in 2015. The declines for Symantec and second-ranked vendor Intel Corp. (NASDAQ: INTC) were attributed to a drop in consumer security and endpoint protection platform software. The latter combines device security functionality into a single capability that delivers antivirus, anti-spyware, firewall and host intrusion prevention.

Of the top five vendors ranked by Gartner, only IBM registered revenue growth last year on the strength of its SIEM sales along with its service business, which the market watcher noted also generates for its product segment. IBM (NYSE: IBM), which integrated its SIEM platform with market leader Resilient Systems last year, acquired the “incident response” specialist earlier this year.

"The below-market growth seen by these large vendors with complex product portfolios is in contrast to the market growth and disruption being introduced by smaller, more specialized security software vendors," Gartner research analyst Sid Deshpande noted in a statement releasing the revenue totals.

The sharp decline in consumer security software also reflects the growing sophistication of security breaches such as ransomware and the desire by more enterprises to detect and blunt attacks as they unfold. Businesses also are realizing that upfront investments in analytics-based approaches like SIEM may yield future savings as the cost of dealing with a single security breach can easily reach into the millions of dollars.

Hence, the core capabilities of SIEM technology are increasingly seen as a more comprehensive way of collecting data points on security "events" along with the ability to correlate and analyze those events across a range of data sources, Gartner noted.

So-called "operational intelligence" vendor such as Splunk Inc. (NASDAQ: SPLK) have recently released new versions of security and user behavior analytics packages. The new capabilities are said to combine the best features of machine learning and anomaly detection to sift through and prioritized data breaches and other threats.

Meanwhile, other emerging SIEM platforms are designed to automate security processes and policies used to respond to everything from insider attacks to lost mobile devices.

Enterprises Embrace Machine Learning

$
0
0

Machine learning technology is poised to move from niche data analytics applications to mainstream enterprise big data campaigns over the next two years, a recent vendor survey suggests.

SoftServe, a software and application development specialist based in Austin, Texas, reports that 62 percent of the medium and large organizations it polled in April said they expect to roll out machine learning tools for business analytics by 2018. That majority said real-time data analysis was the most promising big data opportunity.

The survey authors argue that artificial intelligence-based technologies like machine learning are moving beyond the "hype cycle" as enterprise look to automate analytics capabilities ranging from business intelligence to security. (In the latter case, the Defense Advanced Research Projects Agency is sponsoring an "all-machine hacking tournament" in conjunction with next month's DEF CON hacking convention in Las Vegas. The goal is to demonstrate that that cyber defenses can be automated as more infrastructure is networked via an Internet of Things.)

The survey found that the financial services sector is among the early adopters of big data analytics and emerging approaches such as machine learning. About two-thirds of financial services companies said analytics was a "necessity" to stay competitive while 68 percent said they expect to implement machine-learning tools within the next two years.

Among the incentives for early adoption is growing pressure on financial institutions "to close the gap between the experiences they provide and what consumers have come to expect," the survey authors noted. Big data is increasingly seen as a way to increase client demand for a faster and more accurate service, the added.

For the IT sector, big data is widely viewed as a way to reduce operating costs such as software licensing and commodity hardware savings.

Meanwhile, tools like machine learning also are perceived as helping to break down data siloes while improving the quality of business intelligence data used in decision-making. The survey cited estimates that poor quality data can cost businesses as much as $14 million a year. "A big data transformation is able to overcome this challenge by systematically integrating these silos – and turning bad data into good information," the survey asserts.

"Businesses that take the plunge and implement machine learning techniques realize the benefits early on – it’s big a step forward because it delivers prescriptive insights enabling businesses to not only understand what customers are doing, but why," Serge Haziyev, SoftServe's vice president of technology services, noted in a statement.

The survey of 300 executives in the U.K. and U.S. also found that the retail sector is most concerned about data governance issues.

HPE Gobbles SGI for Larger Slice of HPC-Big Data Pie

$
0
0

Hewlett Packard Enterprise (HPE) announced today that it will acquire rival HPC server maker SGI for $7.75 per share, or about $275 million, inclusive of cash and debt. The deal ends the seven-year reprieve that kept the SGI banner flying after Rackable Systems purchased the bankrupt Silicon Graphics Inc. for $25 million in 2009 and assumed the SGI brand.

Bringing SGI into its fold bolsters HPE’s high-performance computing and data analytics capabilities and expands its position across the growing commercial HPC market and into high-end supercomputing as well. Per analyst firm IDC’s latest figures, the HPC market is at $11 billion and set to grow at an estimated 6-8 percent CAGR over the next three years. The data analytics segment, which is very much in play here, is said to be growing at over twice that rate. “Big data combined with HPC is creating new solutions, adding many new users/buyers to the HPC space,” stated IDC in its June HPC market update.

A joint announcement from HPE and SGI focused on how this explosion in data is driving increased adoption of high-performance computing and advanced analytics technologies in government and commercial sectors. HPC systems are critical for advancing such fields as weather forecasting, life sciences, and increasingly for cybersecurity and fraud detection, said HPE.

“Once the domain of elite academic institutions and government research facilities, high-performance computing (HPC) – the use of ‘super’ computers and parallel processing techniques for solving complex computational problems – is rapidly making its way into the enterprise, disrupting industries and accelerating innovation everywhere. That’s because businesses today are recognizing the big potential in the seas of their corporate data,” Antonio Neri, executive vice president and general manager, HP Enterprise Group, shared in a blog post.

He continued: “Organizations large and small are adopting HPC and big data analytics to derive deeper, more contextual insights about their business, customers and prospects, and compete in the age of big data. These businesses see revenue opportunity in the explosion of data being generated from new sources, like the proliferation of mobile devices, the Internet of Things, the ever-expanding volumes of machine-generated data, and the increase of human data in the form of social media and video.”

SGI CEO Jorge Titinger also emphasized the benefits of the union for data-driven organizations. “Our HPC and high performance data technologies and analytic capabilities, based on a 30+ year legacy of innovation, complement HPE’s industry-leading enterprise solutions. This combination addresses today’s complex business problems that require applying data analytics and tools to securely process vast amounts of data,” he said. “The computing power that our solutions deliver can interpret this data to give customers quicker and more actionable insights. Together, HPE and SGI will offer one of the most comprehensive suites of solutions in the industry, which can be brought to market more effectively through HPE’s global reach.”

SGI makes server, storage and software products, but it’s the UV in-memory computing line that has lately been the coveted star of the company’s portfolio. In February, SGI signed an OEM agreement with HPE for its UV 300H technology, a version of the SGI UV 300 supercomputer that is purpose-built for SAP HANA. As we noted previously, the 8-socket server “filled the gap between its HPE ProLiant DL580 Gen9 Server, with 4-socket scalability at the low end, and the HPE Integrity Superdome X server that scales up to 16 sockets and 24 TB of memory at the high end.”

Notably Dell and Cisco are both resellers for the entire SGI UV 300H line, which scales as a single node system from 4-32 sockets in four socket increments. Just how the SGI sale will affect these arrangements remains to be seen, but it’s hard to imagine Dell as a reseller for HPE.

In the high-end supercomputing segment (systems above $500k per IDC), HPE was the top earner among HPC server vendors in 2015: taking in $1.23 billion in revenue out of a total $3.28 billion. Cray came in second ($583 million) and then Lenovo ($391 million). SGI’s share was $88 million.

IDC 2015 Revenue Share by Vendor - supercomputing

SGI, now located in Milpitas, Calif., after selling its storied Silicon Valley headquarters to Google in 2006, brought in $533 million total revenue in FY16 and $521 million in FY15. Its GAAP net loss for 2016 was $11 million, or $(0.31) per share compared with a net loss of $39 million, or $(1.13) per share in 2015. The company has approximately 1,100 employees.

The deal's $7.75 per share price represents a 30 percent premium over today's closing price of $5.98. In after hours trading, shares of SGI have gone up by nearly 30 percent to $7.70. HPE's stock closed today at $21.78, falling just .05 percent in after hours trading to $21.77.

The transaction is on track to close in the first quarter of HPE’s fiscal year 2017, after which SGI will become part of the HPE Data Center Infrastructure group, led by Alain Andreoli. HPE expects the effect on earnings to be neutral in the first full year after close of sale and accretive thereafter.

The SGI purchase is the latest in a series of big changes for the HP brand. Last September, Hewlett-Packard officially split the PC and printer business from its enterprise (and HPC) activities, creating Hewlett-Packard Enterprise (HPE) to focus on servers, storage, networking, security and corporate services. In May, HPE went through another split when it merged its enterprise services unit with CSC to create a $26 billion “pure-play” IT services organization.

Viewing all 70 articles
Browse latest View live




Latest Images