Skip to main content

Insta_Delete

Earlier this year I began playing with an open source instagram bot, InstaPy, after hearing the creator talk about web automation with selenium. I was very impressed and frankly amazed with what was possible with selenium and wanted to see how I could incorporate it in both my personal projects and professional career.

Since 2010, I had posted more images on IG than Flickr, 8,200+ with a good chunk of them "throw away" images. I already backed up my photos so now came the challenge to clean/delete my feed. Spring cleaning, pruning the hedges, whatever you want to call it, IG doesn't make it easy.

Enter Insta_Delete (no relation to the app that kept popping up in my google searches).

I decided to build myself a bot glorified script to first scroll as far back as possible on my feed, then scrape the page for URL's, parse and find the href links, save them to a file, log in with a mobile emulated browser and delete those old posts.




See screenshot of my starting profile, 8262 posts.



I wrote a script that is working now, albeit not the most pythonic nor cleanest. It is doing it's job considering I have less than a year of experience with python (my background is in the microsoft stack) and I am continually amazed by the power and ease of python and it's ecosystem of packages and community.

(Full project available on my github)

First I load up necessary packages:

# -*- coding: UTF-8 -*-
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime
import time
import os
import sys

Set your paths to the url file and login details (login details can be entered directly):


# store urls to delete later
log_path = 'C:/Users/eddyizm/Source/Repos/seleniumTesting/env/media_urls.txt'
logintext = "C:\\Users\\eddyizm\\Desktop\\Work\\login.txt"
# login text has username on line 1 and password on line 2
# username
# password
URLS = []

Next I define the methods to add time waits, open log, write to the log, and parse the href data with beautiful soup:


def stime(seconds):
    return time.sleep(seconds)

def OpenLog():
    with open(log_path, 'r', encoding= 'utf-8') as g:
        lines = g.read().splitlines()
        return (lines)

def WriteToArchive(log, data):
    with open(log, 'w', encoding= 'utf-8') as f:
        for d in data:
            if d.startswith('https://www.instagram.com/'):
                f.write(str(d)+'\n')
            else:
                f.write('https://www.instagram.com'+str(d)+'\n')
            
             
def parse_href(data):
    url_list = []
    for link in BeautifulSoup(data, "html.parser", parse_only=SoupStrainer('a') ):
        if link.has_attr('href'):
            t = link.get('href')
            if t is not None:
                url_list.append(t)
                
    return url_list            

def scroll_to_end():
    browser = webdriver.Chrome()
    get_html = None
    print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    print ('scrolling profile to get more urls')
    try:
        browser.get("https://www.instagram.com/eddyizm")
        lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        match=False
        count = 0
        while(match==False):
            lastCount = lenOfPage
            time.sleep(10)
            lenOfPage = browser.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
            count += 1
            # added count to ensure only older images get picked up. 
            if (lastCount==lenOfPage) and (count > 100):
                match=True
                
        get_html = browser.page_source                       
        browser.close()
        print ('scrolled down: '+str(count)+' times!')
        print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    except Exception as err:
        print (err)
        browser.close()
    
    return get_html

Finally, I added the method, to login to the site and delete the images. Unfortunately, I had a hard time breaking this up into smaller chunks. The challenge is how to pass the selenium browser session back and forth. I believe that is possible and I barely skimmed the docs. This long function incorporates the functions above to login and delete the posts, which I currently have the counter set to 15, moved up from 10, running 5 times a day.


def login_to_site():
    print ('logging in as mobile device to delete')
    print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    mobile_emulation = { "deviceName": "Pixel 2" }
    options = webdriver.ChromeOptions()
    options.add_experimental_option("mobileEmulation", mobile_emulation)
    options.add_argument("window-size=500,800")
    browser = webdriver.Chrome(chrome_options=options)
    browser.get("https://www.instagram.com/accounts/login/")
    stime(3)
    f = open (logintext, 'r')
    login = f.read().splitlines()
    f.close()
    insta_username = login[0]
    insta_password = login[1]
    eUser = browser.find_elements_by_xpath(
        "//input[@name='username']")
    stime(1)
    ActionChains(browser).move_to_element(eUser[0]). \
        click().send_keys(insta_username).perform()
    stime(1)
    ePass = browser.find_elements_by_xpath(
        "//input[@name='password']")
    stime(2)
    ActionChains(browser).move_to_element(ePass[0]). \
        click().send_keys(insta_password).perform()

    
    stime(5)
    login_button = browser.find_element_by_xpath(
        "//form/span/button[text()='Log in']")
    
    ActionChains(browser).move_to_element(login_button).click().perform()
    stime(10)
        
    links = OpenLog()
    new_file = []
    deleted_urls = []
    counter = 15
    for l in links:
        if l.startswith('https://www.instagram.com/p/'):
            new_file.append(l)
    
    print ('length of file: '+str(len(new_file)))
    if (counter >= len(new_file)):
        counter = (len(new_file) - 1)
    
    print ('counter: '+str(counter))
    
    try:
        print ('DELETING POSTS!')
        print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
        while (counter > -1):
            browser.get(new_file[counter])
            stime(10)
            if ("Sorry, this page isn't available." in browser.page_source):
                deleted_urls.append(new_file[counter])
                counter -= 1
            else:                
                options_button = browser.find_element_by_xpath(
                    "//span[@aria-label='More options']")
                ActionChains(browser).move_to_element(options_button).click().perform()                
                stime(10)
                delete_button = browser.find_element_by_xpath(
                    "//button[text()='Delete']")
                ActionChains(browser).move_to_element(delete_button).click().perform()
                stime(10)
                confirm_delete = browser.find_element_by_xpath(
                    "//button[text()='Delete']")
                ActionChains(browser).move_to_element(confirm_delete).click().perform()
                stime(10)
                deleted_urls.append(new_file[counter])
                print ('POST DELETED: '+new_file[counter])
                counter -= 1

        l3 = [x for x in new_file if x not in deleted_urls]
        print ('while loop done and exited successfully')
        print (datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
        WriteToArchive(log_path, l3)     
        browser.close()

    except Exception as err:
        print (err)
        browser.close()
        sys.exit()

Schedule Task / Cron Tab
On my windows machine I set up a scheduled task that fires off the script via a batch file set up to activate virtual environment and append output results to log file. Linux and Mac would be just as easy using crontab.


REM ************************************************************
REM Batch file to run python script
REM ************************************************************

@echo off
cmd /k "cd /d C:\Users\eddyizm\Source\Repos\seleniumTesting\env\Scripts && activate && cd /d  C:\Users\eddyizm\Source\Repos\seleniumTesting && python insta_delete.py >> C:\Users\eddyizm\Source\Repos\seleniumTesting\env\log.txt"  


Log file output
Handy for debugging and keeping track of how long the scrolling takes and deleting progress. I tail this file to my dropbox or email to keep an eye on it.


----------------------------------------------------------------------------------------------------- 
--------------------------------------- new session ------------------------------------------------- 
2018-08-07 11:00:18
----------------------------------------------------------------------------------------------------- 
file size: 0
file empty, going to scroll
2018-08-07 11:00:22
scrolling profile to get more urls
scrolled down: 617 times!
2018-08-07 12:43:22
logging in as mobile device to delete
2018-08-07 12:43:28
length of file: 30
counter: 15
DELETING POSTS!
2018-08-07 12:43:59
POST DELETED: https://www.instagram.com/p/NJM0L/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NLNSX/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NOkLl/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NO2KG/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPJCZ/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPSq-/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NPS6H/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NUlgG/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NUnRd/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NX6FM/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NYC8u/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NZL_R/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NcgTf/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NcqAb/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NdS2T/?taken-by=eddyizm
POST DELETED: https://www.instagram.com/p/NdwvP/?taken-by=eddyizm
while loop done and exited successfully
2018-08-07 12:55:06
----------------------------------------------------------------------------------------------------- 
2018-08-07 12:55:08
--------------------------------------- end session ------------------------------------------------- 
----------------------------------------------------------------------------------------------------- 

Result:

Running for roughly one month I already removed over 1,600 old posts.



TODO:
I'll be adding a few options to fine tune and make it a more reliable.
1. Capture the date of the post in order to delete by date.
2. Get all the hyperlinks of the images in bulk, while scrolling and not the tail end.

To contribute or use the code yourself: https://github.com/eddyizm/insta_delete

eddyizm
site: https://eddyizm.com
twitter: https://twitter.com/eddyizm
github: https://github.com/eddyizm

Comments

Popular posts from this blog

Data Visualization with Python

Scatter plots with Matplotlib  I'm in the middle of taking a 6 week Data Visualization course at Code Academy so I guess you might call this a midterm project. In this jupyter notebook project, we have use real world space data (celestial star location ) for the Orion constellation and output a 3D scatter plot. This was fun but because it is an intro course, the project didn't even get to labeling the actual star which I thought was bunk. 
At the end of the project, they offer you a link to some star data and challenge you to plot some local stars. So I decided to publish my results that I will end up turning in for the extra credit portion of the project. (code below)
I picked a few stars, starting with some familiar ones, like Sirius, and started plotting it out. It took a while to get the labels on correctly, for some reason, I thought it was going to be easy but it definitely took some searching as the 3D portion of it made finding the examples far more challenging than t…

Copy SSIS packages from MS SQL SERVER 2008 R2

Copy SSIS packages from SQL Server 2008 R2 to another SQL Server 2008 First day at the new job I was asked if I knew anything about moving SSIS packages from one server to another server. The developer/dbo was dreading having to manually download and deploy the SSIS packages from the old server to the new server.

I quickly started looking for an automated, scripted solution and came across this stackoverflow post that was asking nearly the exact same question.  The only real difference was the original question was asking about  moving where as we needed to copy. (moving meant removing from old server as we nearly found out the hard way).

Below is the sql code from that post that I decided to group together on my github for sharing and for later use which by the way, I ended up reusing. I think making this into a stored procedure with a few more bells and whistles might come in handy. 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 3…