Yolov3 Object Detection: Tesseract-OCR Text Recognition and Automating clicks with PyAutoGUI – Ultimate Guide

Introduction: Welcome to Sly Automation’s guide on performing object detection using YOLO version 3 and text recognition, along with mouse click automations and screen movement using PyAutoGUI. In this tutorial, we will walk you through the steps required to implement these techniques and showcase an example of object detection in action.

Cloning the yolov3 Project and installing the requirements

Github Source: https://github.com/slyautomation/osrs_yolov3

This project uses pycharm to run and configure this project. Need help installing pycharm and python? click here! Install Pycharm and Python: Clone a github project

Download YOLOv3 weights

https://pjreddie.com/media/files/yolov3.weights —– save this in ‘model_data’ directory

An Alternative is the tiny weight file which uses less computing but is less accurate but has quicker detection rates.

https://github.com/smarthomefans/darknet-test/blob/master/yolov3-tiny.weights —– save this in ‘model_data’ directory

Convert the Darknet YOLO model to a Keras model.

type in terminal (in pycharm its located at the bottom of the page in the terminal window section):

pip install -r requirements
python convert.py -w model_data/yolov3.cfg model_data/yolov3.weights model_data/yolov3.h5
Convert the Darknet YOLO model to a Keras model

Download Resources

Note: if there’s issues with converting the weights to h5 use this yolo weights in the interim (save in the folder model_data): https://drive.google.com/file/d/1_0UFHgqPFZf54InU9rI-JkWHucA4tKgH/view?usp=sharing

goto Google drive for large files and specifically the osrs cow and goblin weighted file: https://drive.google.com/folderview?id=1P6GlRSMuuaSPfD2IUA7grLTu4nEgwN8D

Step 1: Setting Up the Environment

To begin, we need to set up our development environment. Start by creating an account on NVIDIA Developer’s website. Once done, download the CUDA Toolkit compatible with your GPU. We will use CUDA 10.0 for this example. Install the toolkit and ensure that all components are properly installed.

Check your cuda version

Check if your gpu will work: https://developer.nvidia.com/cuda-gpus and use the cuda for your model and the latest cudnn for the cuda version.

type in terminal: nvidia-smi

Check your cuda version

my version that i can use is up to: 11.5 but for simplicity i can use previous versions namely 10.0

cuda 10.0 = https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_411.31_win10

Step 2: Downloading and Configuring CUDA NN

Next, download the CUDA NN library compatible with your CUDA version. Extract the files and copy them to the CUDA installation directory, overwriting any existing files.

Install Cudnn

cudnn = https://developer.nvidia.com/rdp/cudnn-archive#a-collapse765-10

for this project i need 10.0 so im installing Download cuDNN v7.6.5 for CUDA 10.0 (https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-windows10-x64-v7.6.5.32.zip)

make sure you have logged in, creating an account is free.

Install Cudnn - make sure you have logged in, creating an account is free.

Extract the zip file just downloaded for cuDNN:

Extract the zip file just downloaded for cuDNN

Copy contents including folders:

Copy contents including folders of Cudnn

Locate NVIDIA GPU Computing Toolkit folder and the CUDA folder version (v10.0) and paste contents inside folder:

image

labelImg = https://github.com/heartexlabs/labelImg/releases

tesseract-ocr = https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-setup-3.02.02.exe/download or for all versions: https://github.com/UB-Mannheim/tesseract/wiki

Credit to: https://github.com/pythonlessons/YOLOv3-object-detection-tutorial

Step 3: Creating a Project in PyCharm

Open PyCharm and create a new project for object detection. Set up the necessary directories, including one for storing images and another for Python modules. Use the requirements.txt file and install the required modules using the pip install command.

Github Source: https://github.com/slyautomation/osrs_yolov3

Install Pycharm and Python: Clone a github project

Step 4: Implementing the Screenshot Loop

In this step, we will capture screenshots of the objects we want to detect in our object detection model. Create a Python script to take screenshots at regular intervals while playing the game. Install additional modules such as Pillow, MSS, and PyScreenshot. Run the script and ensure that the screenshots are saved in the designated directory.

script screenshot_loop.py

Make sure to change the settings to suit your needs:

monitor = {"top": 40, "left": 0, "width": 800, "height": 640} # adjust to align with your monitor or screensize that you want to capture

img = 0 # used to determine a count of the images; if starting the loop again make sure to change this number greater than the last saved image e.g below img = 10

image

mob = 'cow' # change to the name of the object/mob to detect and train for.

Run the script, images will be saved under datasets/osrs

image

Step 5: Labeling the Images

Use a labeling tool like LabelImg to draw bounding boxes around the objects in the captured screenshots. Name the bounding boxes according to the objects they represent. Ensure that you label objects in various environments and backgrounds to improve generalizability.

labelImg = https://github.com/heartexlabs/labelImg/releases

Click the link of the latest version for your os (windows or linux), i’m using Windows_v1.8.0.

image

Open downloaded zip file and extract the contents to the desktop or the default user folder.

image
image

Using LabelImg

Open the application lableImg.exe

image

Click on ‘Open Dir’ and locate the images to be used for training the object detection model.

Also click ‘Change Save Dir’ and change the folder to the same location, this will ensure the yolo labels are saved in the same place.

image

if using the screenshot_loop.py script change the directory for both ‘open Dir’ and ‘Change Save DIr’ to the pycharm directory then to the osrs_yolov3/OID/Dataset/train osrs folder.

Click on the create Rectbox button or use the keyboard shortcut W, type the desired name for the annoted object and click ok.

image

The right section is all the annoted objects with the names and in the lower right section is the path to all the images in the current directory selected.

image

Step 6: Generating the Dataset

Create a dataset directory and organize the labeled images and XML files containing the coordinates of the bounding boxes. Generate a text file that lists the paths to the images and their corresponding classes. Convert any PNG images to JPEG format if required.

Step 7: Training the Model

Prepare the necessary YOLOv3 files, including weights, configurations, and anchors. Create a directory for the YOLOv3 scripts. Adjust the batch size, sample size, and other parameters in the training script as per your requirements. Run the training script, ensuring that the GPU is properly detected.

Step 8: Installing Tesseract OCR

Download the Tesseract OCR library from the official repository. Install the program, making sure to select the necessary components. Specify the installation directory during setup.

Step 9: Real-time Object Detection

Modify the real-time detection script to use the trained model and class for the desired object detection. Run the script and observe the results, with bounding boxes around the detected objects displayed on the screen.

Troubleshooting:

Images and XML files for object detection

example unzip files in directory OID/Dataset/train/cow/: cows.z01 , cows.z02 , cows.z03

add image and xml files to folder OID//Dataset//train//name of class **** Add folder to train with the name of each class

***** IMAGES MUST BE IN JPG FORMAT (use png_jpg.py to convert png files to jpg files) *******

run voc_to_yolov3.py – this will create the images class path config file and the classes list config file

while using the pip -r requirements and still get the error: cannot import name 'batchnormalization' from 'keras.layers.normalization download this and save to model_data folder https://drive.google.com/file/d/1_0UFHgqPFZf54InU9rI-JkWHucA4tKgH/view?usp=sharing.

Resolving Batchnormalisation error

this is the error log for batchnormalisation: https://github.com/slyautomation/osrs_yolov3/blob/main/error_log%20batchnormalization.txt this is caused by having an incompatiable version of tensorflow. the version needed is 1.15.0

pip install --upgrade tensorflow==1.15.0

since keras has been updated but will still cause the batchnomralisation error, downgrade keras in the same way to 2.2.4:

pip install --upgrade keras==2.2.4

refer to successful log of python convert.py -w model_data/yolov3.cfg model_data/yolov3.weights model_data/yolov3.h5

https://github.com/slyautomation/osrs_yolov3/blob/main/successful_log%20no%20batchnormalisation%20issues.txt

Conclusion: Congratulations! You have successfully learned how to perform object detection using YOLOv3 and text recognition with PyAutoGUI. By following the steps outlined in this guide, you can automate tasks like mouse clicks and screen movements based on detected objects. Remember to experiment with different environments and objects to enhance the generalizability of your object detection model. Happy automating!

Here’s the video version of this guide:

345 thoughts on “Yolov3 Object Detection: Tesseract-OCR Text Recognition and Automating clicks with PyAutoGUI – Ultimate Guide

  1. Clubhouse davetiye arıyorsanız clubhouse davetiye hakkında bilgi almakiçin hemen tıklayın ve clubhouse davetiye hakkında bilgi alın. Clubhouse davetiye sizler için şu anda sayfamızda yer alıyor.Tıkla ve clubhouse davetiye satın al!

  2. I have not checked in here for some time since I thought it was getting boring, but the last several posts are good quality so I guess I’ll add you back to my daily bloglist. You deserve it my friend 🙂

  3. Wow that was odd. I just wrote an really long comment but afterI clicked submit my comment didn’t show up. Grrrr… well I’m not writing allthat over again. Anyway, just wanted to saygreat blog!

  4. Aw, this was an incredibly nice post. Taking a few minutes and actual effort to produce a really good article… but what can I say… I put things off a lot and never manage to get nearly anything done.

  5. Thanks for ones marvelous posting! I definitely enjoyed reading it, you can be a great author.I will always bookmark your blog and may come back in the foreseeable future. I want to encourage you continue your great work, have a nice evening!

  6. I don’t even know how I ended up right here, however I believed this post was once great. I do not recognize who you’re but certainly you’re going to a well-known blogger in the event you are not already. Cheers!

  7. I blog quite often and I genuinely thank you for your content. This article has really peaked my interest. I am going to take a note of your blog and keep checking for new details about once per week. I subscribed to your RSS feed as well.

  8. Thank you for another fantastic post. The place else may anyone get that type of information in such a perfect method of writing?I have a presentation subsequent week, and I’m on the look forsuch information.

  9. Hey there! This post couldn’t be written any better!Reading this post reminds me of my previous room mate! He alwayskept chatting about this. I will forward this write-up to him.Fairly certain he will have a good read. Thank you for sharing!

  10. That is a really good tip especially to those new to the blogosphere.Short but very precise information… Many thanks for sharing this one.A must read article!

  11. Hi, I do think this is an excellent blog. I stumbledupon it 😉 I am going to return yet again since I bookmarked it. Money and freedom is the best way to change, may you be rich and continue to guide others.

  12. We are searching for some people that are interested in from working their home on a part-time basis. If you want to earn $500 a day, and you don’t mind writing some short opinions up, this is the perfect opportunity for you!

  13. We are searching for some people that are interested in from working their home on a part-time basis. If you want to earn $100 a day, and you don’t mind developing some short opinions up, this is the perfect opportunity for you!

  14. Heya i’m for the first time here. I found this board and I findIt truly useful & it helped me out a lot. I hope to give something back and aid others like you aidedme.

  15. Профессиональный сервисный центр по ремонту бытовой техники с выездом на дом.
    Мы предлагаем:сервисные центры по ремонту техники в мск
    Наши мастера оперативно устранят неисправности вашего устройства в сервисе или с выездом на дом!

  16. Tin Tức, Sự Kiện Liên Quan Đến Thẳng Bóng Đá Nữ Giới nha cai so 1Đội tuyển chọn nước Việt Nam chỉ muốn một kết trái hòa có bàn thắng nhằm lần thứ hai góp mặt trên World Cup futsal. Nhưng, nhằm làm được như vậy

  17. Admiring the time and effort you put into your website and
    in depth information you offer. It’s good to come across
    a blog every once in a while that isn’t the same old rehashed material.
    Wonderful read! I’ve bookmarked your site and I’m adding your RSS
    feeds to my Google account.

    Here is my page – nordvpn Coupons inspiresensation, s.bea.sh,

  18. You could definitely see your enthusiasm in the work you write.The arena hopes for more passionate writers such as you who are notafraid to mention how they believe. At all timesfollow your heart.

  19. When I originally commented I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get three e-mails with the same comment. Is there any way you can remove people from that service? Cheers!

  20. Предлагаем услуги профессиональных инженеров офицальной мастерской.
    Еслли вы искали ремонт холодильников gorenje адреса, можете посмотреть на сайте: ремонт холодильников gorenje адреса
    Наши мастера оперативно устранят неисправности вашего устройства в сервисе или с выездом на дом!

  21. Reconhecida mundialmente, a Bet365 é uma plataforma de jogos online com excelente reputação. Seu catálogo diversificado inclui slots de alta qualidade como o Big Bass Splash, proporcionando uma experiência completa para os jogadores. Com interface intuitiva e navegação simples, é ideal para quem busca praticidade e entretenimento em um só lugar. SBT não vai vem com amistoso entre Corinthians Legends e Boca Legends. O Big Bass Splash é o melhor slot de 10 centavos, já que oferece um RTP elevado, volatilidade alta e prêmios de até 5000x. O Big Bass Splash é um dos slots com maior RTP para este valor mínimo de aposta. Para entender o que é o Big Bass Splash e como jogar o Big Bass Splash, preparamos esse conteúdo para te ajudar a descobrir: Abaixo você confere os melhores cassinos para jogar Big Bass Splash:
    https://mysmartbox-decayeux.com/sweet-bonanza-by-pragmatic-play-a-review-for-austrian-players/
    Remember, the odds can change based on different factors, like how others are playing the game. Knowing these odds helps you make smart choices about when to jump in and when to cash out. With this understanding, you can aim to maximize your winnings and fine-tune your strategy for a more satisfying experience. Wrapping things up, the Roobet Chicken Game mixes strategy and chance, giving players a fun and engaging experience. By making smart choices and timing cash-outs just right, you can work the game’s mechanics to your advantage, all while enjoying its lively graphics and easy-to-use interface. As we’ve covered, getting the most out of the game means understanding how payouts and odds work, and using strategies like spotting patterns and setting limits to keep risks in check.

  22. It is appropriate time to make some plans for the longer term and it’s
    time to be happy. I have read this post and if I could I
    want to suggest you some fascinating things or tips.
    Maybe you could write next articles regarding this article.
    I want to read more things approximately it!

  23. Устал искать информацию по разным сайтам? Есть решение – универсальная платформа!
    Особенно рекомендую статью: Безопасность облачных хранилищ – сравнение AWS Azure и Yandex Cloud
    Всё в одном месте: новости, статьи, справочники, калькуляторы, объявления. Очень удобно и экономит массу времени!

  24. Getting it accommodating in the chairwoman, like a big-hearted would should
    So, how does Tencent’s AI benchmark work? Prime, an AI is delineated a plaster down reprove from a catalogue of as excessive 1,800 challenges, from edifice value visualisations and интернет apps to making interactive mini-games.

    Post-haste the AI generates the pandect, ArtifactsBench gets to work. It automatically builds and runs the shape in a revealed of harm’s pick up and sandboxed environment.

    To foretell of how the relevancy behaves, it captures a series of screenshots ended time. This allows it to corroboration seeking things like animations, pose changes after a button click, and other unequivocal guy feedback.

    Decidedly, it hands all through and beyond all this evince – the citizen importune, the AI’s encrypt, and the screenshots – to a Multimodal LLM (MLLM), to act as a judge.

    This MLLM adjudicate isn’t in symmetry giving a unspecified философема and a substitute alternatively uses a particularized, per-task checklist to commencement the consequence across ten disconnect metrics. Scoring includes functionality, bloke dwelling of the bushed, and throb with aesthetic quality. This ensures the scoring is light-complexioned, in concordance, and thorough.

    The hard imbecilic is, does this automated arbitrate definitely see people finicky taste? The results proximate it does.

    When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard regulation where permissible humans философема on the most opportune AI creations, they matched up with a 94.4% consistency. This is a massive unfold from older automated benchmarks, which at worst managed all past 69.4% consistency.

    On nebbish of this, the framework’s judgments showed in surplus of 90% unanimity with productive reactive developers.
    [url=https://www.artificialintelligence-news.com/]https://www.artificialintelligence-news.com/[/url]

  25. Getting it take an eye for an eye and a tooth for a tooth, like a compassionate would should
    So, how does Tencent’s AI benchmark work? Earliest, an AI is confirmed a ingenious partnership from a catalogue of owing to 1,800 challenges, from pattern quantity visualisations and царствование безграничных потенциалов apps to making interactive mini-games.

    At the unvarying straight away occasionally the AI generates the pandect, ArtifactsBench gets to work. It automatically builds and runs the jus gentium ‘common law’ in a non-toxic and sandboxed environment.

    To lay eyes on how the application behaves, it captures a series of screenshots excessive time. This allows it to corroboration proper to the heart info that things like animations, side changes after a button click, and other quickening dope feedback.

    In the incontrovertible, it hands to the dregs all this demonstrate – the firsthand solicitation, the AI’s cryptogram, and the screenshots – to a Multimodal LLM (MLLM), to law as a judge.

    This MLLM on isn’t good giving a inexplicit мнение and sooner than uses a wink, per-task checklist to gesture the consequence across ten depend on metrics. Scoring includes functionality, the box in nether regions, and retiring aesthetic quality. This ensures the scoring is open-minded, dependable, and thorough.

    The fat fit out is, does this automated reviewer in intention of items upon genealogy taste? The results mainstay it does.

    When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard co-signatory line where touched off humans мнение on the in the most befitting in the pipeline AI creations, they matched up with a 94.4% consistency. This is a hefty unfaltering from older automated benchmarks, which at worst managed hither 69.4% consistency.

    On lid of this, the framework’s judgments showed all fell 90% unanimity with skilled reactive developers.
    [url=https://www.artificialintelligence-news.com/]https://www.artificialintelligence-news.com/[/url]

  26. Getting it in, like a copious would should
    So, how does Tencent’s AI benchmark work? Prime, an AI is foreordained a inspiring forebears from a catalogue of closed 1,800 challenges, from construction state choice visualisations and царствование завернувшемуся способностей apps to making interactive mini-games.

    Post-haste the AI generates the pandect, ArtifactsBench gets to work. It automatically builds and runs the regulations in a non-toxic and sandboxed environment.

    To stare at how the diminish in against behaves, it captures a series of screenshots ended time. This allows it to up against things like animations, circulate changes after a button click, and other going consumer feedback.

    In the lay down one’s life off, it hands on the other side of all this bear out – the native plead object of, the AI’s encrypt, and the screenshots – to a Multimodal LLM (MLLM), to effrontery first as a judge.

    This MLLM deem isn’t responsible giving a unspecified мнение and as contrasted with uses a short, per-task checklist to reference the consequence across ten cease unsigned metrics. Scoring includes functionality, downer be employed, and stable aesthetic quality. This ensures the scoring is advertise, complementary, and thorough.

    The productive without preposterous is, does this automated elect in actuality get the function after watchful taste? The results proffer it does.

    When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard arrange where existent humans rebuke scram after on the choicest AI creations, they matched up with a 94.4% consistency. This is a monstrosity sprint from older automated benchmarks, which manner managed circa 69.4% consistency.

    On lid of this, the framework’s judgments showed across 90% concord with licensed perchance manlike developers.
    [url=https://www.artificialintelligence-news.com/]https://www.artificialintelligence-news.com/[/url]

  27. Getting it retaliation, like a assiduous would should
    So, how does Tencent’s AI benchmark work? Prime, an AI is prearranged a fanciful area from a catalogue of as immoderation 1,800 challenges, from construction word creme de la creme visualisations and web apps to making interactive mini-games.

    When the AI generates the jus civile ‘laic law’, ArtifactsBench gets to work. It automatically builds and runs the jus gentium ‘non-exclusive law’ in a bolt and sandboxed environment.

    To twig how the germaneness behaves, it captures a series of screenshots upwards time. This allows it to unite respecting things like animations, conditions changes after a button click, and other charged benumb feedback.

    Basically, it hands to the loam all this smoking gun – the aboriginal importune, the AI’s encrypt, and the screenshots – to a Multimodal LLM (MLLM), to law as a judge.

    This MLLM deem isn’t out-and-out giving a unspecified философема and preferably uses a little, per-task checklist to put down the conclude across ten conflicting metrics. Scoring includes functionality, consumer circumstance, and inaccessible aesthetic quality. This ensures the scoring is light-complexioned, compatible, and thorough.

    The miraculous confute is, does this automated arbiter elegantiarum in actuality infirm avenge taste? The results broach it does.

    When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard podium where bona fide humans opinion on the most apt AI creations, they matched up with a 94.4% consistency. This is a mammoth at the decline of a hat from older automated benchmarks, which on the other хэнд managed circa 69.4% consistency.

    On lid of this, the framework’s judgments showed in dispensable of 90% insight with all precise perchance manlike developers.
    [url=https://www.artificialintelligence-news.com/]https://www.artificialintelligence-news.com/[/url]

Leave a Reply

Your email address will not be published. Required fields are marked *