Wednesday, March 27, 2013

BigQuery: Simplified


Google’s BigQuery gives users the ability to merge multiple large data sets using a common key, Big JOIN.

“Big JOIN simplifies data analysis that would otherwise require a data transformation step, by allowing users to specify JOIN operations using SQL,” states Michael Manoochehri, Developer Programs Engineer, Cloud Platform.

The article uses an example of web applications producing millions of lines of user activity log code in a few hours. Normally, the breaking up of all this code would be cumbersome, but Big JOIN eliminates that. Great for e-commerce sites.

Another update from popular request was answered in this update. It provides a native TIMESTAMP data type for importing of data and time values while keeping time zone information.
Much like Google Docs, you can now share your data sets using BigQuery’s Web UI via email notifications.



“Data analysis with the BigQuery API should be as simple as possible, and that is in keeping with Google’s philosophy overall on many things.”

Manoocherhri notes that they still need to work on BigQuery’s public documentation. However, they have been working on it and plan to release several tools soon to make it more ‘wieldy’.

Below is a video detailing on the updates with demos


Article: http://googledevelopers.blogspot.com/2013/03/bigquery-gets-big-new-features-to-make.html

1 comment:

  1. This doesn't require a BIG JOIN, or a BIG GROUP BY aggregation, but you can run a few queries of this size on our public wikipedia revision dataset using the BigQuery API's free quota.

    /* In cases where we know the username of the editor, returns a table
    * of the most prolific (by revision count) Wikipedia editors of (possibly)
    * Auburn related pages, aggregated by month, ordered by rev count
    */
    SELECT
    title, contributor_username,
    FORMAT_UTC_USEC(UTC_USEC_TO_MONTH(timestamp * 1000000)) AS revision_month,
    COUNT(*) as revision_count
    FROM
    [publicdata:samples.wikipedia]
    WHERE
    contributor_username != '' AND
    REGEXP_MATCH(title, "^(War Eagle|Auburn University|Auburn Tigers)(?i)")
    GROUP BY
    title, contributor_username, revision_month
    ORDER BY
    revision_COUNT DESC;

    If you have any questions about the API, ask on Stack Overflow with the tag 'google-bigquery'

    ReplyDelete