3 posts tagged

Programming

2023   AI   Programming

DIY: Summarise Your Speech

I like discussing ideas out loud. I record my thoughts using Voice Memos on my iPhone. I talk for hours but never listen to my records, because it gets boring.

We will make an app, that transforms audio into text. And then, we will make a structured summary. And even a Business Model Canvas of an idea.

Plan

  1. Install Openai Library
  2. Audio to Text
  3. Audio to Summary
  4. Audio to Business Model Canvas
  5. Draw a Business Model Canvas Automatically

Requirements

You need Python and Openai account.

if you don’t have Python or don’t know how to work with libraries on Python, take a look at my previous tutorial about object detection in images, where we installed it on Mac and Windows.

If you have some errors or don’t understand something, ask chatGPT.

Install OpenAI Library

Open “Terminal” on Mac or in Visual Studio Code. type the following:

pip install openai

The library is used for 3 purposes: ask ChatGPT, translate speech to text, and generate an image. We use ChatGPT and Speech-to-Text functions.

Get the API key

Openai is not free to use. We should pay 0.006$ per each minute of audio. Therefore, we need to register on the Openai website and get the API key, that we insert later in the code. Openai also gives 5$ to the new users.

get the API key after registration.

Record your voice

Use the “Voice Memos” app on Mac or iPhone or any other recording app. Most of the audio file formats are supported by Whisper AI from OpenAI.

Speak for up to 10 minutes in English. Next, move the recording file to the same directory where you will store the Python file. You may also copy the path to the audio file instead of moving it.

Audio to Text

import openai

# insert your API key
openai.api_key = "sk-tIBD5UTbJ9RdqrwjP2dNT3BlfkFJzex4WGqvNsi3uAdMdC1V"

# Define the path to the file. if the file is in the same folder as the Python code, just write the name of a file
path = "audio.m4a"
audio_file= open(path, "rb")

# Select the AI to use (whisper-1 is the newest in April 2023)
transcript = openai.Audio.transcribe("whisper-1", audio_file)

print(transcript.text)

We have translated our speech into text.

Audio to Summary

We use ChatGPT-3 to make a summary. You may use gpt-3.5 that costs 10 times less or gpt-4 for a more precise summary of more than 4000 words.

The maximum amount of words is 4096. The token is a way how the AI stores the data. for English, each token represents nearly one word (700 words are nearly 1000 tokens). For Russian, German, and French – each token is nearly one letter (700 words are nearly 5000 tokens). Translate the text to English and only then use chatGPT. You will save 5x dollars.

In Prompt we add the translated text and the task that we want an AI to do.

# save the translated text as text
text = transcript.text

prompt = (
        f"Make the summary of the text: \n\n{text}\n\n"
    )

response = openai.Completion.create(
    model="text-davinci-003", 
    prompt=prompt,
    max_tokens=3900 # 4096 minus the constant prompt message
    n=1,
    stop=None,
    temperature=0.5,
)

print(response.choices[0].text.strip())

combine this code with the previous one and run the script. That’s it.

Audio to Business Model Canvas

If I talk about founding a business, why not combine the text in a business model canvas.

prompt = (
    f"Create a business model canvas based on the idea explained below:\n\n{text}\n\n"
    "Use the following structure of 9 blocks:\n"
    "Customer Segments:\n"
    "Value Propositions:\n"
    "Channels:\n"
    "Customer Relationships:\n"
    "Revenue Streams:\n"
    "Key Resources:\n"
    "Key Activities:\n"
    "Key Partners:\n"
    "Cost Structure:"
)

You get the structured answer, and you can insert the data in your business model canvas template.

Draw business model canvas

Let’s make a real business model canvas out of the idea described in the audio. We need to do a simple website with CSS, HTML, JSON, and Python. I prepared it for you:
Download the folder

Install Flask library

pip install flask

Run text_to_bmc.py, audio_to_bmc.py, or bmc_web.py (if you don’t have openai account) within the given folder. Don’t forget to write Openai API key and the audio or text:

Open in the browser: http://127.0.0.1:5000. Or any other page that is written in your terminal

you may also run audio_to_bmc.py or text_to_bmc.py, where the full code is written, just edit the API key and that’s it.

That’s it. You don’t need any longer to think about how to structure the business model canvas.

Challenge

Create the House of Quality model, that tells the difference between you and competitors.

Create the business plan template, which is created automatically based on your recording – add the SWOT, PESTEL analysis, intro about the product, executive summary, team structure, and maybe competitor and market analysis, using the knowledge from my “DIY” articles.

Share the tutorial with your friends and colleagues. If you want to see how to record one hour of a speech or team meeting, and how to record in any language and cheaply, then leave a comment. Meanwhile, read other DIY articles, you can find them, by simply typing “DIY” in the search on the top right. Also, check out the articles about finances and urban planning.

If you want to use a similar app casually, subscribe to my Telegram bot. @ @speech_into_text_bot can listen for up to 10 minutes of speech and transform it into text and summarise everything you said. It costs just 20 cents per 10 minutes of speech. The money is spent on Openai libraries and server maintenance.

2023   AI   DIY   Programming

DIY: Image Recognition

You have a farm, and you need to count the number of cows in the field to make sure none are missing. A machine can do this for you:

A program has found 77 cows in one second.

Real-life examples: count customers who entered a store today, sort fruits, or count cars in parking.

We will create a simple image analyzer to count the number of cows in a field and automatically highlight them in a photo. No programming experience required. If you don’t understand something, just ask ChatGPT.

The code is universal, so you can detect people or other objects just by editing a single line of code. We’ll use YOLOv8, a cutting-edge, highly accurate image recognition library that’s free to use.

Requirements:

  • Computer (Mac or Windows or Linux)
  • Internet
  • Python (we will install it)
  • AI model YOLOv8 (we will download it)
  • Image of cows (we will download it)

Setup the Workspace

The coding process involves two parts: writing code and executing it. Typically, we write code in a text editor, and we execute it using a tool like “Terminal” on a Mac. However, there are applications specifically designed for coding, like Visual Studio Code, which lets you write and execute code on the same page.

Visual Studio will also help you install Python. You can also install extensions that make coding faster (such as Python autocomplete and Copilot) or an extension that stores your code online (like a Git repository).

Install Python

Python is a programming language. We’ll use it to write code and install the AI models. To run the Python code, we’ll use Visual Studio Code. To install Python, simply open “Extensions” on the left bar, type “Python” in the search field, and install the Python extension.

Install Python on Mac

During installation, you may encounter errors. Ask ChatGPT for help resolving them (e. g., copy and paste the error message).

To install Python on a Mac, you need to download Homebrew, which is like an app store for programmers. Open the “Terminal” application (press Command+Space and type “Terminal”), then copy and paste the code below into the “Terminal”:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Press enter and do everything that the program says.

After installing Brew, it’s recommended to run

brew update

This ensures you have the latest version of Homebrew and that you’ll download the most recent version of Python.

Next, download Python by typing

brew install python

Follow the instructions and ask ChatGPT if you encounter any errors.

Install on Windows

Visit the o official Python installation page and select the latest Python installer for Windows.

Installation is straightforward, but you might encounter some problems. If you do, explain the errors to ChatGPT (e. g., copy and paste the error message).

Write Code

Create a new document by pressing Command+N. Next, save the file using Command+S as “analyze_image.py”, where “analyze_image” is the name, and “.py” indicates a Python file.

Now, write a simple code that prints “Hi!”:

print("Hi!")

Save a code by pressing command+s

Run Code

To run the code, click the play button. You will see the output in the “Terminal” window at the bottom.
(Note: your terminal may look different from mine, but it doesn’t make any difference)

you will get an answer here

You may also press the F5 button. The first time you use it, you will be asked which debugger (same as running the code) to use. Select the first one.

Alternatively, you can run a program using the terminal in Visual Studio Code or the “Terminal” on Mac. This is slightly more challenging. First, you should open the folder where the Python file is located. This is displayed at the top of Visual Studio Code.

To open it, type the path and the “cd” function (change directory) in the command line. Don’t include the name of the file; stop at the last folder.

cd /Users/daniil_kovekh/Desktop/bot/Image/find_cow

Now, you can list all the files in the folder.

ls

To run the Python code, type “python” and the name of the file with “.py” at the end.

python analyse_image.py

That’s it. If you encounter an error, ask ChatGPT for help. There may be a mistake in the code.

Download Libraries

Basics

A library is code written by other developers. Here’s how to use one:

For example, if you want to create a graph showing the number of customers who visited your store last week, you could use an existing graphing function like “Matplotlib.” This library can draw any graph you need.

First, download the library from the internet. Use the command line in Visual Studio Code for this (you can also use the “Terminal” application on Mac). We’ll use the Python installer, called “pip.”

pip install matplotlib

Press “Enter” on your keyboard.

After it’s downloaded, simply write this code.

the “#” symbol is used for comments. The text after it on the line will be
visible to you, but not to the machine.

# Import the library and the module pyplot. The module contains functions.
import matplotlib.pyplot

# Define the data. We store data as lists - the elements that have the same format.
# days is a list that contains the text - it is a "string" format
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

# customers is a list containing numbers - "integer" format
customers = [120, 140, 110, 100, 170, 200, 210]

# Add the data to the graph (X, Y).
# We use a bar graph - the function is also named "bar".
matplotlib.pyplot.bar(days, customers)

# Draw and display the graph.
matplotlib.pyplot.show()

Save the code with the Command+S button, and click the play button to run the code (or use other methods that we learned previously). The graph appears in another window, or maybe on another desktop.

Free or Paid

Most libraries are free, but some require payment. For example, Yahoo Finance offers free access to stock price data. However, OpenAI, the provider of ChatGPT, charges $0.06 for every 1000 words printed using ChatGPT-4.

The YOLOv8 – the image analyzer is free to use.

Internet Access

Some libraries require internet access. Yahoo Finance and OpenAI both work with the internet. Yahoo Finance uses the internet to access its database of stock prices. OpenAI uses the internet for another reason: it receives the text you wrote, runs the AI program on its server, and sends you the answer.

YOLOv8 works both online and offline. YOLOv8 needs the internet to access the database with trained models that identify what’s in a picture. If you train the model yourself, by uploading hundreds of photos and naming what is drawn on it, then it will work offline.

Import YOLOv8 Library

The YOLO library is developed by Ultralytics. To install it, simply run the following command:

pip install ultralytics

You might encounter errors regarding the need to download other libraries or outdated packages. Just copy-paste those error messages here in ChatGPT.

Get Images of Cows

We’ll find cows in meadows by using these images:
https://koveh.com/img/cows.jpg
https://koveh.com/img/cows2.jpg
https://koveh.com/img/cows3.jpg
Download these images and place them in the folder containing the “analyse_image.py” script.

Write Cow Detection Code.

Let’s write the basic code to detect cows in the images. First, set the path to the source image. Then, define the model – we’ll use a simple universal pre-trained model called “yolov8n.pt”. Next, determine whether to save a new image with highlighted cows. Also, set the confidence threshold for the computer to identify a cow (0 to 1). The lower the confidence level, the more chances are that the cow will be found, but the risk that another object will be selected as a cow will also be higher (e. g., a car may become a cow). Set the confidence threshold to 0.4.

Additional settings can be found on the official webpage

Note: We don’t specifically define that we’re searching for cows; the code simply identifies the most obvious objects in the picture. We’ll fine-tune this later.

from ultralytics import YOLO

source = "/Users/daniil_kovekh/Desktop/bot/Image/find_cow/cows.jpg"

# Load a YOLOv8 model from a pre-trained weights file
model = YOLO('yolov8n.pt')

# Find the cow with confidence 0.4 and save the image
model.predict(source, save=True, conf=0.4)
The results are saved in the runs/detect/predict folder. If the “predict” folder already exists, it will be named “predict1”, etc.

Here are the results of an image with 2 cows on it:

Results with a large number of cows:

As we can see, when the cows are smaller and overlap each other, the code makes mistakes due to uncertainty.

Improve Results – Decrease Confidence.

To improve the results, we can adjust the confidence threshold and Intersection Over Union (IOU) – which accounts for overlapping cows, such as when a cow hides behind a tree. Change these settings within a range of 0 to 1.

from ultralytics import YOLO

source = "/Users/daniil_kovekh/Desktop/bot/Image/find_cow/cows.jpg"

# Load a YOLOv8 model from a pre-trained weights file
model = YOLO('yolov8n.pt')

# Find the cow with confidence 0.4 and save the image
model.predict(source, save=True, conf=0.4, iou= 0.4)

The results may still not be ideal, because when the confidence threshold is low, the AI might misidentify other objects, like cars, as cows.

Improve Results – Increase Image Size

By default, the image width is 640 pixels. The size of cows3.jpg is approximately 5000x3000 pixels, while cows.jpg has a different size. Determine the appropriate image size for each picture using the Python Imaging Library (PIL).

Install PIL using the terminal and pip command:

pip install Pillow

Now, write Python code in Visual Studio:

from PIL import Image

source = "/Users/daniil_kovekh/Desktop/bot/Image/find_cow/cows3.jpg"

# Load the image and get its width
image = Image.open(source)
width, _ = image.size # _ means the height, that we dont need. 

# Load a YOLOv8 model from a pre-trained weights file
model = YOLO('yolov8n.pt')

# use imgsz equal to width
model.predict(source, imgsz=width, save=True, conf=0.4, iou= 0.4)

This approach yields better results, but feel free to experiment with the settings to further improve them.

Count the Cows

To count the cows, we’ll tally the number of “cow” indices. Cow indices are stored in a dictionary, similar to the example below:

animals = {"cow": 0.8, "cow":0.5, "dog":0.4, "cow":0.66, "duck":0.4}

This dictionary records each detected object in the image and its confidence as the specified animal. We’ll count all the cows present in the dictionary. In the example above, there are three cows.

from ultralytics import YOLO
from PIL import Image

source = "/Users/daniil_kovekh/Desktop/bot/Image/find_cow/cows3.jpg"

# Load the image and get its width
image = Image.open(source)
width, _ = image.size

# Load a YOLOv8 model from a pre-trained weights file
model = YOLO('yolov8n.pt')

# Define the cow class index
cow_class_index = None # default, if no cows are on the picture

for key, value in model.names.items():
    if value == "cow":
        cow_class_index = key # key is the first element in a Python dictionary
        break

# Run prediction on the first image
results = model.predict(source, save=True, imgsz=width, conf=0.3, iou = 0.5)
boxes = results[0].boxes
cow_detections = [box for box in boxes if int(box.cls) == cow_class_index]

print(f"Number of cows in {source}: {len(cow_detections)}")
The computer counts 77 cows. (out of 78)

Analyze Anything Else

You’re now equipped to write object detection code for various purposes. If you want to detect people, simply change the value from “cow” to “person” or “cup”.

4 people
18 people
3 cups of coffee

Explore the documentation and analyze whatever you’d like – you can analyze videos or live streams, detect a person’s movements, or even draw an individual’s skeleton to assess the movements of athletes.

If you require an AI application, website, bot for Telegram, Google extension, automation tool, or web scraper, feel free to contact me at daniil@koveh.com. My team and I are eager to assist you.

2023   AI   Programming