如何快捷的收集活動帖子做成匯總?

本文內容已被 [ nearby ] 在 2022-04-10 21:18:10 編輯過。如有問題,請報告版主或論壇管理刪除.

美語世界的妖妖靈 (她是版主麽?)來詢問鄰兄如何匯總活動帖子的。鄰兄於是做了兩件事:

  1. 要求她稱呼鄰兄'虎哥' (she did)。一不做二不休,服務大家,於是把這個程序公開於此。鄰兄是Java和Excel的絕世高手 (別來請教我,我確實沒時間解答問題),但鄰兄不是Python的高手,才學Python,是故現在做啥都寫Python,以熟悉之。
  2. 為她把鄰兄的Python 程序加了許多說明
 
You can copy/paste the codes below into a Python program.  If you have Python 3 installed on your computer, you can then follow the prompted instructions to make 活動帖子的收集基本全自動.  
 
Good Luck! 拒不解答後續問題!
 
 
# Author: 書香之家版主 nearby, March 2022
#
# Usage of this Python program:
# 0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)!
# 1. Place this file in a folder. Say, in a folder named "wxc"
# 2. Go to your '論壇', search for your '活動' title. You will get one or more pages. Remember how many pages there are.
#       If you do not know how to do this, just skip this step, I will then assume that there are 3 pages (150 entries, which is more than usual)
# 3. execute this program, you will be prompted (asked for) the name of your activity, and
#    the number of pages you obtained in step 2 (if you do not know the number of pages, just hit ENTER)
#    Example:
#               春天的暢想
#               3 (or Hit ENTER key)
# 4. You will also be prompted for your 論壇's name in alphabets/English. You can look up this in your 論壇.
#    For example, 書香之家 has the URL https://bbs.wenxuecity.com/sxsj/, so its English name is sxsj.
#    Other examples include: 美語世界 is mysj, 文化走廊 is culture, 詩詞欣賞 is poetry, etc.
# 5. The result is stored inside 'wxc/sxzj-out.html'. You can then copy/paste the source code of 'sxzj-out.html' into your WXC new page. Done!
#
#
# Note: By default the entries are organized in reverse chronological order.
# Should you need them to be placed in chronological order, please do:
# Comment out the statement: mylist.reverse() by placing # in front of it, like: #mylist.reverse()
#
#

import requests


notargets = ['跟帖', '輸入關鍵詞', '內容查詢', 'input name', '當前', '首頁', '上一頁', '尾頁', '下一頁']
notargets.append('archive')
# This is how SXZJ (書香之家) works. When 無憂 starts an activity, she always marks her activity like this.
notargets.append('##活動##')
# notargets.append('匯總')


def isInside(line, notargets_array):
    for t in notargets_array:
        if t in line:
            return True
    return False
# END

# the line looks like <a href="/sxsj/76799.html" target="_blank"><em>春天的暢想</em>】春天屬於女人</a>
# I need it to be <a href="https://bbs.wenxuecity.com/sxsj/76799.html" target="_blank"><em>春天的暢想</em>】春天屬於女人</a>
def addHttp(line):
    at = line.split('href="')
    line2 = '<a href="https://bbs.wenxuecity.com' + at[1]
    return line2
# END

def processOneFile(target, html, mylist):
    # split the text by newline character to get an array of string
    all = html.text.split('\n')
    length = len(all)
    i = 0
    while i < length:
        line = all[i]
        if (target in line) and (not isInside(line, notargets)):
            line = addHttp(line)
            print(line)
            i = i + 1
            line2 = all[i]
            # look like: [書香之家] - <strong>WXCTEATIME</strong>(6987 bytes ), need to be WXCTEATIME only
            line2 = line2.replace('</strong>', '<strong>').split('<strong>')[1]
            i = i + 1
            line3 = all[i]
            line += "  " + line2 + "  " + line3
            mylist.append(line)
        i = i + 1
# END of FUNCTIONS


# ---- main starts here ----

print()
print('# Author: 書香之家版主 nearby, March 2022')
print()

target = input('What is the title of your activity (活動)?:  ')
pages = 3 # default, means there are maximum 150 entries
temp = input('How many pages there are when you search for the activity in WXC? (If you do not know, just Hit ENTER): ')
if temp != '':
    pages = int(temp)

subid = 'sxsj'
temp = input('What is the name of your 論壇 in English? For example, 書香之家 is sxsj, 美語世界 is mysj, 文化走廊 is culture, 詩詞欣賞 is poetry: ')
if len(temp) >= 2:
    subid = temp

mylist = []
# this is the output file.
html2 = open('sxzj-out.html', 'w', encoding='utf-8')

url = 'https://bbs.wenxuecity.com/bbs/archive.php?SubID='+subid+'&pos=bbs&keyword=' + target + '&username='

f = requests.get(url)
processOneFile(target, f, mylist)
for i in range(1, pages):
    url = 'https://bbs.wenxuecity.com/bbs/archive.php?page=' + str(i) + '&SubID=' + subid +'&pos=bbs&keyword=' + target + '&username='
    f = requests.get(url)
    processOneFile(target, f, mylist)

mylist.reverse()
for li in mylist:
    html2.write('<p>' + li+'\n')
html2.close()

print("\n")
print(str(len(mylist)) + " entries")
print("\n")
print("Please check the file sxzj-out.html. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby / 鄰兄")

所有跟帖: 

讚! -WXCTEATIME- 給 WXCTEATIME 發送悄悄話 WXCTEATIME 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:34:44

讚! -可能成功的P- 給 可能成功的P 發送悄悄話 可能成功的P 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:41:00

讚鄰兄,分享的精神可嘉。。。也讚鄰兄的智慧,比如堅決不說話。。。:) -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:48:48

對了,妖妖靈是美語壇版主。:)還有,被絕世高手四個字震暈了。。。。LOL -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:53:13

好奇問一句,但是版主名字裏沒看見她。她是誰? :-) 自吹自吹,牛皮就是靠吹的 :-) -nearby- 給 nearby 發送悄悄話 nearby 的博客首頁 (424 bytes) () 04/10/2022 postreply 08:57:38

LOL讚這吹力。。。。:) -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:05:00

她是版主,人可以有好多件衣服,對吧?:) -WXCTEATIME- 給 WXCTEATIME 發送悄悄話 WXCTEATIME 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:06:21

程序一但啟動,隻消輸入活動名稱,一切搞定。鄰兄就不回貼了哈。謝謝書香的朋友們 (及樓上的茶兄、小p、憂憂) -nearby- 給 nearby 發送悄悄話 nearby 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:55:52

讚美! -lovecat08- 給 lovecat08 發送悄悄話 lovecat08 的博客首頁 (0 bytes) () 04/10/2022 postreply 08:57:07

發程序時連個manual 都不順便寫一個,其實夠歹毒的 -kirn- 給 kirn 發送悄悄話 kirn 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:08:24

不得不批評小k,程序裏一半都是 manual, 解釋了兩遍該如何用 -nearby- 給 nearby 發送悄悄話 nearby 的博客首頁 (177 bytes) () 04/10/2022 postreply 09:11:00

作為一個用過類似簡單大蛇程序的過來人,我可以很可憐的告訴你,我是被文件名等等搞昏的。除非經常用,否則轉眼就忘。。連哪個 -kirn- 給 kirn 發送悄悄話 kirn 的博客首頁 (95 bytes) () 04/10/2022 postreply 09:27:02

其實這個呢,懂的人一眼就懂了,不懂的話要補的課太多。。。鄰兄也是無償分享啊,這個工作應當是文學城技術部門來做的。。。 -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:11:35

有技術部嗎。我以為主要是營銷部呢。。。誌願者倒是個個技藝驚人 -kirn- 給 kirn 發送悄悄話 kirn 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:28:49

有的。:) -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 10:39:51

不想做版主的鄰兄就不是好貓咪。。我繞道。:) -魯冰花- 給 魯冰花 發送悄悄話 魯冰花 的博客首頁 (0 bytes) () 04/10/2022 postreply 09:28:50

哇哇哇,虎哥真是活雷鋒!!! 太感謝啦!!! 趕緊抱回家去好好琢磨!!! -妖妖靈- 給 妖妖靈 發送悄悄話 妖妖靈 的博客首頁 (936 bytes) () 04/10/2022 postreply 11:08:24

希望這個能幫上妖妹。虎哥拿四個論壇,特別是你的和你的活動試過,都行。我第一次匯集活動也是手動,累暈 :-) -nearby- 給 nearby 發送悄悄話 nearby 的博客首頁 (0 bytes) () 04/10/2022 postreply 12:49:12

讚鄰版,文采和高科技俱佳。 -莊文雅- 給 莊文雅 發送悄悄話 莊文雅 的博客首頁 (0 bytes) () 04/10/2022 postreply 11:29:27

真才華! 我是暈了,繞行… :~) -老林子裏的夏天- 給 老林子裏的夏天 發送悄悄話 老林子裏的夏天 的博客首頁 (0 bytes) () 04/10/2022 postreply 11:39:13

鄰兄示範了給論壇搞些技術革新其實並不難,我曾建議多次論壇試點不顯跟貼但有跟貼就自動上升,並不很難的,近兄應當文城技術顧問 -老鍵- 給 老鍵 發送悄悄話 老鍵 的博客首頁 (0 bytes) () 04/10/2022 postreply 11:39:24

老鍵快來參加活動吧。。。:) -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 13:53:52

啊沒注意你們在搞活動,比賽編程? Python我還可以 -老鍵- 給 老鍵 發送悄悄話 老鍵 的博客首頁 (0 bytes) () 04/10/2022 postreply 14:31:39

哈哈。是人間情色活動。我看過你的情色。。。。LOL -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 15:52:21

忘了說,鄰兄請網管把這個帖子放到論壇右邊掛著收藏起來吧。。。 -塵凡無憂- 給 塵凡無憂 發送悄悄話 塵凡無憂 的博客首頁 (0 bytes) () 04/10/2022 postreply 14:13:27

不明覺厲,鄰兄威武! -浮雲馳- 給 浮雲馳 發送悄悄話 浮雲馳 的博客首頁 (0 bytes) () 04/10/2022 postreply 14:55:55

讚鄰兄愛心滿滿! -applebee3- 給 applebee3 發送悄悄話 applebee3 的博客首頁 (0 bytes) () 04/10/2022 postreply 15:34:09

請您先登陸,再發跟帖!