Google Adds To BigQuery Big Data Capabilities

Google Adds To BigQuery Big Data Capabilities

Google expands the capabilities of its BigQuery system to allow real-time data stream processing and event analysis.
8 Google Projects To Watch in 2015
8 Google Projects To Watch in 2015

Google has announced updates to Google BigQuery and Cloud Dataflow — the search giant’s two big data management systems that compete with Amazon Web Services’ DynamoDB and Data Pipeline.

In a blog, Google’s William Vambenepe, lead product manager for big data on Google’s Cloud Platform, claimed Google has implemented a more thorough “cloud way” to managing big data than other IaaS providers. By that Vambenepe means the service is provided without the user needing to know anything about how it’s deployed, scaled, or managed, making it a “NoOps” service.

inRead™ invented by
In one update to BigQuery, Google has introduced row-level permissions, a finer-grained approach to granting access to data in a database, according to Vambenepe. With row-level permissions, it’s possible to grant a user access to a particular type of data in a database without opening up neighboring data to inspection.

Row-level permissions make it easier to share internal data with a variety of users. Partners or other parties outside the company can be granted permission to access a BigQuery data set in the cloud, but still be restricted to specific rows, Vambenepe wrote in his April 16 blog post.

[Want to learn more about BigQuery competitors? See MongoDB Eyes Bigger, Faster NoSQL Deployments.]

The default ingestion limit for BigQuery has been raised to 100,000 rows per-second, per-table with unlimited storage for handling large data analysis tasks. BigQuery works with large structured data sets for SQL analytics similar to a relational database system, or with loosely structured data assembled as JSON (JavaScript Object Notation) objects.

Several NoSQL systems, such as Cassandra and MongoDB, also work with JSON objects.

The Google Cloud Platform also introduced the beta version of a new service, Google Cloud Dataflow. Cloud Dataflow provides event/time-based data stream processing, available as an on-demand service. Stream processing can also be scheduled as a batch service, if the Google Cloud user choses.

A Cloud Dataflow user doesn’t need to set up a cluster on which to run the stream-flow processing.

“Just write a program, submit it, and Cloud Dataflow will do the rest,” Vambenepe wrote.

Stream processing and event-related processing are done on a data stream, such as a feed of stock trades from an exchange, with the system looking for trades at a particular level of pricing, or at particular time intervals. Stream processing can also be used against an application’s server log, where it watches for particular software events in the application and triggers an alert when it spots one.

Google’s BigQuery processing and Cloud Dataflow stream analysis are now connected to another service — Cloud Pub/Sub — to allow notice of event occurrence to selected IT administrators or business end-users. Vambenepe wrote that Cloud Pub/Sub “completes the platform’s end-to-end support for low-latency data processing.”

Open source data systems, such as Hadoop, Spark, and Flink’s data stream processing capabilities may be used with BigQuery as well, Vambenepe wrote. Google will provide connectors between those systems and its BigQuery and Cloud Storage services.

“Scuba equipment helps humans operate under water,” observed Vambenepe, but they’re no match for the agility of creatures that belong in the water. “When it comes to big data and the cloud, be a dolphin, not a scuba diver,” he concluded.


Please follow and like us:

Post a comment

Your email address will not be published. Required fields are marked *

close slider
  • +44 (0)203 004 9596
  • This field is for validation purposes and should be left unchanged.