This month’s Build Newsletter highlights how software and data are everywhere by featuring one of the most prolific topics in software development today—the Industrial Internet, also referred to as the Internet of Things (or IoT). We also feature the current impact and methods for big data cleanliness, another hot topic in the past month. Of course, there will be news and commentary on topics we often cover as well—app development, big data, in-memory data grids, open source software, and data science.
The massive influx of data, and role of technologies such as Apache Hadoop®, is well-established among enterprises, industries, and government institutions. But ever-increasing amounts of data and new use cases present new challenges and a need for faster and more malleable technologies. UC Berkeley’s AMPLab brings together academics and businesses to engage in dialogue and collaboration to develop the next generation of big data technologies. Pivotal is a sponsor of AMPLab, supporting UC Berkeley’s effort to connect the brightest minds who are innovating within the big data and data science sphere.
Ian Huston and Alexander Kagoshima of Pivotal Labs delivered a presentation at the Cloud Foundry Summit 2015 demonstrating how they have used Cloud Foundry to deliver data-driven applications to clients. Data scientists synthesize a wide range of skills in their efforts to understand complex data sets and deliver insights, and Cloud Foundry enables practitioners to quickly get to work, rather than losing time setting up servers or performing operations tasks. During their talk, the pair detailed the ways that Cloud Foundry can simplify data science workflows and deliver insights to users.
One of the goals for the Spring XD 1.2 release was to obtain the baseline performance metrics on a typical cluster of machines and then optimize stream performance where necessary. Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. Our testing drove several optimizations to increase streaming performance. The benchmarks found that a single threaded Spring XD stream can handle over 2 million (100 byte) events a second, using Apache Kafka as a transport.
The Spring XD engineering team has some big announcements regarding Spring XD 1.2 and 1.1.3 along with Flo for Spring XD. Focusing on developer experience and productivity, the new features cover Flo, performance optimization, new sources/processors/sinks/batches, runtime refactoring to act as native apps in Pivotal Cloud Foundry, Apache Ambari installed clusters, resiliency improvements, registry HA support, improved integration with Pivotal HAWQ, Pivotal Gemfire, Pivotal Greenplum Database, Pivotal HD, and Sqoop.
Today, Pivotal announced an exciting acquisition of big data query technology from the University of Wisconsin-Madison. As part of the acquisition, Professor Jignesh Patel will be joining Pivotal and starts his tenure here sharing why this is such a great move for Pivotal customers, the Quickstep technologies and himself.
Recently, Pivotal’s Michael Cucchi participated in an interview with two of the hosts of theCUBE, SiliconANGLE’s video channel. This post provides a high-level summary of the interview, provides a link to the video, and elaborates on some of Cucchi’s key points.
In this post, we share a recent research on the business outcomes of big data efforts. First, key statistics put out by Accenture, GE, and IBM are highlighted along with supportive information on the cost factors for technologies like Apache Hadoop compared to legacy systems. The post then summarizes and provides reference information for 20 examples of where big data is providing results for companies, and all of the companies discussed these results in the past year.
EMC World kicks off on Monday, May 4th in Las Vegas with a focus on 2015 being the year of "digital transformation," during which social, cloud, mobile and big data technologies will converge to revolutionize how enterprises do business. Pivotal's products and services will take center stage at the show, with keynote speeches, product demos, and sessions demonstrating the role that big data, cloud technologies, and agile software development play in this significant transition.
Scientists around the world are performing experiments and doing analysis with a focus on investigating the nature of Climate Change. The Big Data vs. Climate Change program is a joint effort by EMC Corporation, Pivotal and the Earthwatch Institute. It enables the study of interactions between nature and climate, and promotes the engagement of citizen scientists using data lakes, analytic tools and visualizations. In this episode Simon is joined by Vatsan Ramanujan who is a Principal Data Scientist at Pivotal. Vatsan shares some insight into the work that was done, and some interesting stories from "out in the field".