Smečka psů se hádá
Agresivní wrangler
Krásně rozhádaní psi

Shromažďujte údaje o psech

url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)
Twitter archivní tabulka
Tabulka predikce obrazu
Twitter API extrahovat tabulku

Posoudit údaje o psech

pd.info()
pd.value_counts()
pd.describe()
pd.sample()
pd.isnull()
pd.duplicated()

Čistá data psů

# Make copies of original pieces of data
twitter_archive_df_clean = twitter_archive_df.copy()
image_prediction_df_clean = image_prediction_df.copy()
twitterAPI_extract_df_clean = twitterAPI_extract_df.copy()
Problémy s úklidem
Kvalita
# merge data 1 & 3
twitter_archive_df_clean = pd.merge(twitter_archive_df_clean, twitterAPI_extract_df_clean, on='tweet_id', how = 'outer')

# merger the result of the above merger to data 2
master_twitter_df = twitter_archive_df_clean.merge(image_prediction_df_clean[['tweet_id', 'jpg_url', 'prediction']], how = 'outer', on = 'tweet_id')

Konečná datová sada

Uchovávejte data psů

Hrdý wrangler
master_twitter_df.to_csv('twitter_master.csv', index = False)