12 .y Downloading all the images in an exported observations file

Here's a Python alternative to Armand's PHP script (for downloading all the images in an exported observations file).
There is only the url of the first image in the csv when there are more images per observation.

https://groups.google.com/forum/#!topic/inaturalist/VbU36xDeP3k
When you export the observations either from http://www.inaturalist.org/observations/export or from the links form your project page, the CSV you get has an image_url column with a link to the photos.

There are browser plugins, little freeware programs, or you can write very simple scripts (perl/php etc.) that fetch photos using a list of URLs

%% Constants and imports

import os
import sys
import pandas as pd
import humanfriendly
import urllib
import urllib.request
from multiprocessing.pool import ThreadPool
import time

OBSERVATION_LIST_FILE = r"D:\temp\observations-38841.csv\observations-38841.csv"
IMAGE_OUTPUT_DIR = r"D:\temp\observations-38841.csv"

Concurrent download threads

nThreads = 250

overwriteFiles = True

Create the output directory if necessary

if not os.path.exists(IMAGE_OUTPUT_DIR):
os.makedirs(IMAGE_OUTPUT_DIR)

%% Read observations

print('Reading observation .csv file')
df = pd.read_csv(OBSERVATION_LIST_FILE)
print('Read {} observations'.format(len(df)))

%% Enumerate images we want to download

ids = set()
indexedUrlList = []

for iImage, row in df.iterrows():
url = row['image_url']
id = row['id']
# Make sure the ID is unique
assert not id in ids
ids.add(id)
outputFileName = os.path.join(IMAGE_OUTPUT_DIR,'{}.jpg'.format(id))
indexedUrlList.append([iImage,url,outputFileName])
print('Fetching URL {} to {}'.format(url,outputFileName))

nImages = len(indexedUrlList)

%% Download files in parallel (download function)

errorList = [''] * nImages

Input should be a 3-element list: index, url, output filename

Returns url, imageFilename, error

def fetch_url(indexedUrl,nImages):

assert len(indexedUrl) == 3

iImage = indexedUrl[0]
url = indexedUrl[1]
imageFilename = indexedUrl[2]
parentDir = os.path.dirname(imageFilename)

doDownload = 1

# Check whether the file already exists
if os.path.exists(imageFilename):

if overwriteFiles:
print("File {} exists, over-writing".format(imageFilename))
else:

print("File {} exists, skipping".format(imageFilename))
errorList[iImage] = 'skipped'
doDownload = 0

if doDownload:

# Make the parent directory if necessary
os.makedirs(parentDir, exist_ok=True)

# Download the file

print("Downloading file {} of {} ({}) to {}".format(iImage,nImages,url,imageFilename))

try:
urllib.request.urlretrieve(url, imageFilename)
errorList[iImage] = 'success'
sizeString = humanfriendly.format_size(os.path.getsize(imageFilename))
print("Downloaded file {} to {} ({})".format(url,imageFilename,sizeString))

except:
s = "Error downloading file {}".format(url)
print(s)
errorList[iImage] = s

return url,imageFilename,errorList[iImage]

%% Download files in parallel (loop)

https://stackoverflow.com/questions/16181121/a-very-simple-multithreading-parallel-url-fetching-without-queue

time() == Time in seconds since an arbitrary historical point

start = time.time()
pool = ThreadPool(nThreads)

results = pool.imap_unordered(lambda x: fetch_url(x,nImages), indexedUrlList)

results = pool.map(lambda x: fetch_url(x,nImages), indexedUrlList)

nErrors = 0
nSuccess = 0
nSkipped = 0

for url,imageFilename,errorString in results:
if errorString is 'success':
nSuccess = nSuccess + 1
elif errorString is 'skipped':
nSkipped = nSkipped + 1
else:
print("Error fetching {}: {}".format(url, errorString))
nErrors = nErrors + 1

downloadTime = time.time() - start

print("Elapsed Time: {} seconds".format(humanfriendly.format_timespan(downloadTime)))

estimate_ss_download.estimate_ss_download(nImages,-1,downloadTime)

print("Attempted to download {} images".format(nImages))

print("{} succeeded, {} skipped, {} errors".format(nSuccess,nSkipped,nErrors))

sys.stdout.flush()

Posted on 13 de outubro de 2018, 08:46 PM by ahospers ahospers

Comentários

iNaturalist

Open source Rails app behind iNaturalist.org

Want to help out? Fork the project and check out the Development Setup Guide (might be a bit out of date, contact kueda if you hit problems getting set up).

Thinking about running your own version of iNaturalist? Consider joining the iNaturalist Network instead of forking the community. https://github.com/inaturalist/inaturalist

-
https://www.inaturalist.org/pages/tips_tricks_nz

--
https://www.inaturalist.org/pages/tips_tricks_nz
Search Term and Tricks
https://groups.google.com/forum/#!topic/inaturalist/vqQH4FmChfE
Russell Pfau's iNat tips & tricks
https://www.inaturalist.org/people/pfau_tarleton
Cassi Saari's iNat tips & tricks
https://www.inaturalist.org/journal/bouteloua/14205-inat-tips-tricks

Publicado por ahospers mais de 5 anos antes

But you're welcome to write your own script that posts through our API
https://www.inaturalist.org/pages/developers

Here's a snippet of posting users and obs over the API written in Ruby

require 'rubygems'
require 'rest_client'
require 'json'

First, enter your app_id, app_secret, and redirect_uri from

http://gorilla.inaturalist.org/oauth/applications/206

site = "http://gorilla.inaturalist.org"
app_id = '308714d38eaf78ed57c11c0790f639d7d05e86cb7564f641629116e5b3bea024'
app_secret = '8be0ee61e1b7858a14c050b982eb3e6447b2d674075f3d8c58c8759ed2ee02a6'
redirect_uri = 'http://www.bd.dix/utils/migratelanding.cfm'

Next, visit this link on your browser while logged in as 'tegenligger'

http://gorilla.inaturalist.org/oauth/authorize?client_id=308714d38eaf78ed57c11c0790f639d7d05e86cb7564f641629116e5b3bea024&redirect_uri=http%3A%2F%2Fwww.bd.dix%2Futils%2Fmigratelanding.cfm&response_type=code

and get your auth code

auth_code = "d9c5335b17c0ec05f7444b1673c675b16a9a4d77f2d499b852778888503760a3"

Next, get a token for tegenligger

payload = {
:client_id => app_id,
:client_secret => app_secret,
:code => auth_code,
:redirect_uri => redirect_uri,
:grant_type => "authorization_code"
}
response = RestClient.post("#{site}/oauth/token", payload)
token = JSON.parse(response)["access_token"]
headers = {"Authorization" => "Bearer #{token}"}

Now make a user using tegenligger's token

username = 'testuser1'
email = 'testuser1@bar.net'
password = 'testuser1password'

results = RestClient.post("#{site}/users.json", {"user[login]" =>
username, "user[email]" => email, "user[password]" => password,
"user[password_confirmation]" => password}, headers)
puts "created http://gorilla.inaturalist.org/users/#{JSON.parse(results)["id"]}"

Now get a token for testuser1

payload = {
:client_id => app_id,
:client_secret => app_secret,
:grant_type => "password",
:username => username,
:password => password
}
puts "POST #{site}/oauth/token, payload: #{payload.inspect}"
response_for_user1 = RestClient.post("#{site}/oauth/token", payload)
token_for_user1 = JSON.parse(response)["access_token"]
headers_for_user1 = {"Authorization" => "Bearer #{token}"}

Now make a observation on behalf of testuser1

results = RestClient.post("#{site}/observations.json",{
"observation[species_guess]" => "Northern Cardinal",
"observation[taxon_id]" => 9083,
"observation[observed_on_string]" => "2013-01-03",
"observation[time_zone]" => "Eastern Time (US %26 Canada)",
"observation[description]" => "what a cardinal",
"observation[tag_list]" => "foo,bar",
"observation[place_guess]" => "clinton, ct",
"observation[latitude]" => 41.27872259999999,
"observation[longitude]" => -72.5276073,
"observation[map_scale]" => 11,
"observation[location_is_exact]" => false,
"observation[positional_accuracy]" => 7798,
"observation[geoprivacy]" => "obscured"
}, headers_for_user1)

puts "created http://gorilla.inaturalist.org/observations/#{JSON.parse(results)[0]["id"]}"

Now make a another user using tegenligger's token

username = 'testuser2'
email = 'testuser2@bar.net'
password = 'testuser2password'

results = RestClient.post("#{site}/users.json", {"user[login]" =>
username, "user[email]" => email, "user[password]" => password,
"user[password_confirmation]" => password}, headers)
puts "created http://gorilla.inaturalist.org/users/#{JSON.parse(results)["id"]}"

Now get a token for testuser2

payload = {
:client_id => app_id,
:client_secret => app_secret,
:grant_type => "password",
:username => username,
:password => password
}
puts "POST #{site}/oauth/token, payload: #{payload.inspect}"
response_for_user2 = RestClient.post("#{site}/oauth/token", payload)
token_for_user2 = JSON.parse(response)["access_token"]
headers_for_user2 = {"Authorization" => "Bearer #{token}"}

Now make a observation on behalf of testuser2

results = RestClient.post("#{site}/observations.json",{
"observation[species_guess]" => "Northern Cardinal",
"observation[taxon_id]" => 9083,
"observation[observed_on_string]" => "2013-01-03",
"observation[time_zone]" => "Eastern Time (US %26 Canada)",
"observation[description]" => "what a cardinal",
"observation[tag_list]" => "foo,bar",
"observation[place_guess]" => "clinton, ct",
"observation[latitude]" => 41.27872259999999,
"observation[longitude]" => -72.5276073,
"observation[map_scale]" => 11,
"observation[location_is_exact]" => false,
"observation[positional_accuracy]" => 7798,
"observation[geoprivacy]" => "obscured"
}, headers_for_user2)

puts "created http://gorilla.inaturalist.org/observations/#{JSON.parse(results)[0]["id"]}"

Publicado por ahospers quase 5 anos antes

Adicionar um Comentário

Iniciar Sessão ou Registar-se to add comments