Tech, Cloud and Programming

Data Engineering and ChatGPT

|

ChatGPT is out and everybody is trying it. Blogs, songs are easy, but can it help with Data Engineering?

question 1

i have a table named 'people' with fields: id, name, job_type and a table names 'jobs' with fields: id,job_name. Create a join on this two tables in sql

To create a join on two tables in SQL, you can use the JOIN keyword. Here's an example:

SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id

This query will return a table with columns name and job_name, and each row will represent a person and the job they have. The JOIN keyword is used to specify the tables to be joined, and the ON keyword is used to specify the criteria for matching rows in the two tables. In this case, the ON clause specifies that rows in the people table should be matched with rows in the jobs table based on the value of job_type in the people table and the value of id in the jobs table.


Impressive, but let's see how much it can do.

question 2

can you insert some random data in both tables

To insert data into a table in SQL, you can use the INSERT INTO statement. Here's an example:

INSERT INTO people (id, name, job_type)
VALUES (1, 'John', 2),
       (2, 'Jane', 1),
       (3, 'Bob', 3);

This will insert three rows into the people table. Similarly, you can use the INSERT INTO statement to insert data into the jobs table as well:

INSERT INTO jobs (id, job_name)
VALUES (1, 'Teacher'),
       (2, 'Engineer'),
       (3, 'Doctor');

This will insert three rows into the jobs table. Note that the values in the job_type column of the people table should correspond to the values in the id column of the jobs table, so that the join will work correctly.


Great, let's try how it's context is.

question 3

can you show the result of the join with these example data

Here's an example of how you can use the JOIN keyword to join the people and jobs tables and show the results:

SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id

This query will return a table with columns name and job_name, and each row will represent a person and the job they have. For example, if the data in the people and jobs tables are as follows:

people
id  name  job_type
1   John  2
2   Jane  1
3   Bob   3

jobs
id  job_name
1   Teacher
2   Engineer
3   Doctor

Then the result of this query will be:

name    job_name
John    Engineer
Jane    Teacher
Bob     Doctor

This shows that John is an engineer, Jane is a teacher, and Bob is a doctor.


I'm a AWS Data Engineer, curious how far it goes.

question 4

can you write the terraform code to execute this query in athena

To execute a SQL query in Amazon Athena using Terraform, you can use the aws_athena_query_execution resource. Here's an example:

resource "aws_athena_query_execution" "example" {
  query = <<QUERY
SELECT people.name, jobs.job_name
FROM people
INNER JOIN jobs ON people.job_type = jobs.id
QUERY

  result_configuration {
    output_location = "s3://my-bucket/query-results"
  }
}

This code will create an Athena query execution that runs the specified SQL query. The query results will be saved to the specified S3 bucket. Note that this code assumes that the people and jobs tables have already been created in Athena. You will need to use additional Terraform code to create these tables. For more information, see the AWS Athena documentation.


Very impressive, gives the example and the explanation. How far can we go with this?

question 5

can you write this query in pyspark in a glue job

! Hmm...something seems to have gone wrong. Maybe try me again in a little bit.


Lucky :) seems we still have some work as a data engineer, but ChatGPT is very powerful. But the amount of progress is staggering. In a short while it'll be able to create complex etl jobs. With access to the data it will analyse data and suggest analytics.

Exciting development!