How to let Ai help data analysts work-job 1

story background:

openai has released the api call interface, and Peking University released a chatexcel tool a few days ago. These two things have nothing to do with each other, but engineers always find something to do when they have nothing to do. In a technology group, I bragged to people that if openai opens up the api, I can also make a chatexcel, which is even better than what they do.

1. To meet the needs of natural language

2. Can accurately understand user needs

3. Can give accurate analysis results

4. It is necessary to give a visual presentation report

5. If possible, it is best to make a ppt presentation

Well, then it is the road to filling the pit again, in order to quickly make the minimum price poc of the product. So I used openai api+visual chatgpt together. In fact, if you really want to make products, these must be packaged and made with openai api. For users, there is only a demand interaction box and a place to input data cvs table. Here I am verifying the upper and lower bounds of the product, so please allow me to make a harmless foul.

The idea is as follows:

1. After the user enters the form, the header is parsed, and the meta information is parsed out, which is ready for subsequent user demand analysis

2. Format the input description first, and let the openai API generate the code for automatic data analysis (when it is commercialized, the broad needs of users can be converted into formatted input through openai)

3. Parse the generated python code and save it in .py format

4. Use the python os package to execute the python script, and convert the data visualization into html format for easy clicking to view

Let's see the actual effect:

Call the api to generate codes through natural language descriptions, and the resulting codes are stacked together, so parsing is required

Code analysis, here is a lazy one, let chatgpt help analyze the results

The code generated by openai is analyzed as follows. Because of the version of the package, there are some version conflicts. In order to quickly verify, I gave up resolving the conflict and asked chatgpt to help generate the code for the task again.

import pandas as pd
import jieba
from pyecharts import WordCloud

# read in the data from the CSV file
data = pd.read_csv('product_reviews.csv')

# split product reviews into individual words
reviews = data['Product Review']
word_freq = {}
for review in reviews:
    words = jieba.cut(review)
    for word in words:
        if word in word_freq:
            word_freq[word] += 1
            word_freq[word] = 1

            # sort the words by frequency
sorted_word_freq = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)

# print the top 10 most frequent words
print('Top 10 most frequent words:')
for word, freq in sorted_word_freq[:10]:
    print(f'{word}: {freq}')

    # create a word cloud of the top 50 most frequent words
wordcloud = WordCloud(width=800, height=620)
wordcloud.add("", sorted_word_freq[:50], word_size_range=[20, 100])

chatgpt parsing task, generate code

There are some small bug s in the above code

So I directly tested using chatgpt to generate code

import pandas as pd
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from collections import Counter

# read the data from the table
df = pd.read_csv('product_reviews.csv')

# create a list of stop words
stop_words = ['of', 'up', 'yes', 'I', 'you', 'he', 'she', 'us', 'you', 'them']

# tokenize the product reviews and count the frequency of each word
words_list = []
for review in df['Product Review']:
    words = jieba.lcut(review)
words_freq = Counter(words_list)

# remove stop words from the word frequency dictionary
for stop_word in stop_words:
    words_freq.pop(stop_word, None)

# sort the word frequency dictionary by descending order of frequency
sorted_words_freq = sorted(words_freq.items(), key=lambda x: x[1], reverse=True)

# print the top 10 most frequent words
print('Top 10 most frequent words in product reviews:')
for word, freq in sorted_words_freq[:10]:
    print(f'{word}: {freq}')

# create a word cloud using the top 50 most frequent words
wordcloud = WordCloud(background_color='white', width=800, height=400).generate_from_frequencies(words_freq.most_common(50))

# plot the word cloud
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')

# save the word cloud as an HTML file

even gives what packages to install

Let chatgpt help generate some data for testing

import csv
import random

# List of product names
product_names = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']

# Generate product review data
product_reviews = []
for i in range(1000):
    # Pick a product name at random
    product_name = random.choice(product_names)
    # Generate a random comment
    product_review = f"This is a great {product_name}!"
    # Randomly generate impressions and clicks
    num_exposures = random.randint(1, 100)
    click_count = random.randint(0, num_exposures)
    # Add to product review list
    product_reviews.append([product_name, product_review, num_exposures, click_count])

# Write product review data to a CSV file
with open('product_reviews.csv', mode='w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    # write header
    writer.writerow(['Product Name', 'Product Review', 'Number of Exposures', 'Click Count'])
    # data input

Integrate project test code effect

generate data

Code generated word cloud


1. On the whole, chatgpt is already very powerful, and can basically do a very good job in every link. There are only minor problems in some versions and data conversion (list and dictionary)

2. By dismantling specific tasks and connecting the processes well, chatgpt can basically solve the actual work production completely

3. Using a single system chatgpt or openai API + simple business process in series, the effect of developing ai application products is amazing

4. Future product interaction will be more humanized and concise

5. For the small problem of 1, I think it can be completely solved through the domain code fintune

Tags: Python AI AIGC

Posted by samoi on Fri, 03 Mar 2023 00:17:22 +0530