The Power of AI in Problem Solving: A Data Engineer's Perspective
Last week, I received a request from a client to extract data from an existing system and store it in text files in an AWS S3 bucket. The data was needed by the developers for solutioning.
Attempts
SSMS Export Data: My initial thought was to use the SSMS function to export the data. However, this method has limitations as it exports one table at a time, making it tedious and time-consuming.
SSMS Export Data Tier: I attempted to export the entire database using the SSMS function, but ran into authorization issues as I wasn't an admin. This would have resulted in longer delivery timeframes.
SSIS package: I created an SSIS package on my local machine to extract the data from the database and store it in a specific folder. However, I encountered validation errors when I attempted to deploy it in the client's environment.
VBA: To avoid delays caused by ticketing and authorization issues, I decided to use VBA to programmatically extract the data. I created a solution using Excel macros that replicated the function of the SSIS package. After refining and testing the solution on my local machine and using ChatGPT, I was able to convert the code into Python.
Recommended by LinkedIn
Solutioning
Once the data was extracted, moving it to the S3 bucket was a straightforward process. ChatGPT proved to be a valuable tool for refining and converting the code into Python.
Lessons
As a consultant, it's essential to be adaptable and look for alternative solutions when the usual methods aren't feasible. Problem-solving skills are crucial, and being able to conceptualize solutions using different technologies is a valuable asset. ChatGPT is a useful tool for any programmer to have in their toolbox, but it shouldn't replace human programmers as every problem is unique and requires an understanding of the fundamental problem to utilize ChatGPT's generic solutions effectively.
#DataEngineering #AWS #SSMS #SSIS #VBA #Python #ProblemSolving #Consulting #ChatGPT #ArtificialIntelligence #Technology #Adaptability #DataManagement #DataExtraction #DataProcessing Tentpole Data Sciences
import pyodb
import csv
import os
server = 'Server'
database = 'Database'
trusted_connection = 'yes'
driver = '{SQL Server}'
# Set the export folder path
export_folder = 'D:/Users/Export_folder/'
# Set the delimiter for the CSV files
delimiter = ','
# Create a connection to the SQL Server database
conn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';DATABASE='+database+';Trusted_Connection='+trusted_connection)
# Create a cursor object to execute SQL queries
cursor = conn.cursor()
# Get the list of tables in the database
tables_query = 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = \'BASE TABLE\' ORDER BY TABLE_NAME'
cursor.execute(tables_query)
tables = cursor.fetchall()
# Loop through all tables in the database
for table in tables:
table_name = table[0]
try:
# SQL query to select all data from the table
data_query = 'SELECT * FROM [' + table_name + ']'
# Execute the query
cursor.execute(data_query)
data = cursor.fetchall()
# Set the file path and name for the CSV file
file_path = export_folder + table_name + '.csv'
# Create the export folder if it doesn't exist
if not os.path.exists(export_folder):
os.makedirs(export_folder)
# Export the data to CSV file
with open(file_path, 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=delimiter)
# Get the column names and write them to the CSV file
column_names = [column[0] for column in cursor.description]
csv_writer.writerow(column_names)
# Write the data to the CSV file
csv_writer.writerows(data)
except Exception as e:
# Log the error and continue to the next table
print('Error exporting table {}: {}'.format(table_name, str(e)))
continue
# Close the cursor and connection
cursor.close()
conn.close()
c