In this post, Anirudh Kondaveeti, a Principal Data Scientist at Pivotal, provides an in-depth, real-world example of how data science applies to mechanical and materials engineering in the semiconductor manufacturing industry. Step-by-step, he covers de-noising, preprocessing, feature extraction, dimensionality reduction, outlier detection, and clustering to show how yield and profitability are improved.
In the first installment of this series, we used the SpringTrader application as an example of an existing legacy Java app, and we outlined an approach to do two things—migrate the app to Pivotal Cloud Foundry and prepare it to run in a microservices architecture. In this post, we dive into the refactoring and modernization of one key service, enabling it to deal with real-time market data.
We are excited to announce that our collaborative, open source math, statistics, and machine learning library, known as MADlib, is entering Apache incubation. In addition to hearing from a few MADlib industry experts, this post explains what MADlib is, provides a short history, describes why it is moving to Apache®, outlines its value in the enterprise, and illustrates the current community membership.
Yesterday, Pivotal announced that we donated the Pivotal HAWQ core to the Apache Software Foundation (ASF) and it is now an officially incubating project. Apache HAWQ is a redesign of HAWQ architecture to enable greater elasticity to meet the requirements of a growing user base. With the addition of YARN support and its acceptance as an Apache project, HAWQ is now more than ever a truly Hadoop Native SQL Engine. This blog is a technical primer for the background and architecture Apache HAWQ.
Today, Pivotal announced it has open sourced HAWQ and MADlib, contributing them to the Apache Software Foundation (ASF) where they are now officially listed as incubating. In this post, Pivotal’s data engineering leader, Gavin Sherry explains why HAWQ and MADlib are needed to create a Hadoop Native SQL infrastructure, and why the only way forward to do that is through open governance and and curation managed by the ASF.
Pair programming and agile development have become popular buzzwords in recent years, but these practices have been fundamental to how Pivotal Labs develops and delivers products since its inception. As a result, companies often collaborate directly with Labs to develop applications, while also learning techniques of close collaboration and iterative development. Members of IDEO Labs, the research and development group within global design agency IDEO, recently spent a month at Pivotal Labs to build an application using agile development methods.
Strata + Hadoop world NYC is just around the corner. As a long time sponsor, we are looking forward to this event and networking with folks across the community. In this post, you can find out how to connect with us—booth demos, presentations, networking events, meetups, customer dinners, and more. And we have some big news is coming. So stay tuned!
In this post, Pivotal’s Chief Scientist, Jignesh Patel, recaps the key talks and papers contributed by Pivotal at the 41st annual VLDB conference. Covering common table expressions, improvements from hardware/software collaboration, a new perspective on benchmarks, R and hardware utilization issues, and mapping relational learning to relational algebra, Jignesh explains and provides reference links to all of the topics. He also provides links to the presentations being covered again at several meetups.
Time series data is produced in domains such as IT operations, manufacturing, and telecommunications. Examples of time series data include the number of client logins to a website on a daily basis, cell phone traffic collected per minute, and temperature variation in a region by the hour. Forecasting a time series signal ahead of time helps us make decisions such as planning capacity and estimating demand. In this post, Anirudh Kondaveeti examines the modeling steps involved in forecasting a time series sequence which includes multiple seasonal periods.
Part three of the Cloud Native Journey series is focused on legacy application migration and modernization. It provides a variety of factors and inputs to help guide decision-making—which legacy apps make sense to migrate, what portions make sense, where governance causes harm, how to identify agile candidates, and much more.