We'll respond shortly.
This episode bundles some of the latest news and trends in the container space, as well as the impacts of .NET Core 1.0 being released to the world of open source. Guest author Kevin Hoffmann joins the podcast to discuss his recent book Beyond the Twelve-Factor App (including 3 more factors!) and provide color on the trends across containers and .NET and how they are impacting app development.
The new release of Apache MADlib 1.9 (incubating) includes support vector machines, which can be used for classification and regression tasks. SVM models have two particularly desirable features: robustness in the presence of noisy data and applicability to a variety of data configurations. This post describes the distributed implementation of SVM in MADlib for very large data sets and shows some scalability test results.
Earlier this year, a highly pedigreed group of big and fast data professionals gathered at the Apache Geode Summit. While all of the session videos and slides are now online, we wanted to highlight three major themes from the conference and point out the related videos and slides.
One of the biggest announcements coming out of Google Cloud Platform’s Next Conference last week was about Apple moving workloads from AWS, but there is much more to the story than the headline. The world of poly-cloud is making big moves from financial justifications to customer moves to product developments. We cover it in this week’s BUILD Newsletter.
Today, VMware and Pivotal share an important milestone in our promise to deliver a next generation, turnkey Cloud Native platform that will fundamentally transform how companies deliver and run custom enterprise software. We are announcing the availability of the open source Photon Platform Cloud Provider Interface (CPI) for Cloud Foundry’s BOSH, an API that is used to interact with an underlying IaaS to create and manage objects on an infrastructure, including images, VMs and disks. Simply put, now Cloud Foundry users have the ability to manage their application’s lifecycle on the lightweight VMware Photon IaaS.
The Apache Software Foundation (ASF) is one of the open source organizations for the Google Summer of Code 2016 (GSoC 2016) program. As a sponsor of the ASF, Pivotal is keen on supporting students looking to work in the complex and growing field of big data by developing features across a number of ASF Incubating projects that power up our data products including Apache Geode (incubating), Apache HAWQ (incubating) and Apache MADlib (incubating). For students around the world, it also offers an opportunity to pair and learn from Pivotal’s data engineers, as well as earn $5500! Deadline to apply is Friday, March 25, 2016.
In this post, Scott Hajek, from Pivotal’s famous data science team, explains approaches for working with unstructured text, extracting the data and turning it into structured records. He explains entity recognition and related NLP techniques, such as human-supervised feedback loops, that use machine learning to automate extraction. Several examples are provided.
In this post, one of Pivotal’s data scientists, Scott Hajek, explains how Greenplum Database (either the open source or the Pivotal version) can be used for information extraction. After a brief introduction, he walks through the concepts, the capabilities within Greenplum and the processing steps with plenty of example code.
Open source software, such as the Apache HadoopⓇ standard within the Big Data realm, has become the default and dominate choice when companies are choosing to deploy software. Not too long ago, most executives didn’t necessarily see how much of their operations run on open source. Now, they do. There are three key reasons why, and we outline these in this post along with an upcoming webinar on an Open Source Playbook for 2016.
I recently attended PGconf in Vienna, Austria, where we announced the open-sourcing of Pivotal Greenplum, which has become the first open source massively parallel data warehouse. Now known as Greenplum Database in it's open source form, anyone can clone the github repo and build the product, but there is another segment of the community that just wants to try out the functionality of the product without going through that process. For that group, we now have the Pivotal Greenplum Sandbox Virtual Machine that allows a free trial of the open source Greenplum Database, the commercially available Pivotal Greenplum Command Center management tool, Apache MADlib (incubating), PostGIS, PL/R, PL/Perl, and PL/Java into an easy-to-use virtual machine which runs in either VirtualBox or VMware Fusion.